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ABSTRACT 


A  traditional  aim  of  planning  systems  is  the  optimal 
trade-off  of  cost  of  planning  and  cost  of  execution.  Such 
optimization  of  cost  is  the  subject  of  classical 
statistical  decision  theory;  a  synthesis  seems  called  for. 

The  method  of  classical  decision  theory  is  to  model  a 
problem  with  a  decision  tree,  then  to  compute  expected 
costs  by  backward  induction.  This  cannot  be  used  here; 
building  the  tree  is  doing  the  planning.  A  compromise  is 
proposed,  of  using  a  decision  tree  only  one  level  deep, 
and  then  using  cost  estimates  for  the  tip  nodes.  This  is 
based  on  traditional  Artificial  Intelligence  work  on 
mini-max  game  playing.  A*  search,  and  MULTIPLE;  however, 
this  proposal  uses  changes  in  both  probability  and 
expected  cost.  A  control  structure  in  the  style  of 
MULTIPLE  is  developed,  unfortunately  requiring  a  number  of 
ad  hoc  steps. 

A  hand-simulated  STRIPS-like  robot  is  described, 
using  a  simplified  form  of  the  proposed  decision  method. 
The  workability  and  advantage  of  the  method  are 
demonstrated.  Hope  is  offered  for  the  ultimate  development 
of  a  theory  of  optimal  planning  free  of  ad  hoc  reasoning. 
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CHAPTER  1 


INTRODUCTION 


A  frequently  cited  desirable  feature  of  a  planning 
system  is  that  It  should  optimally  trade  off  cost  of 
planning  and  cost  of  execution.  However,.  how  this 
optimization  is  to  be  achieved  is  seldom  specified; 
usually  merely  the  admonition  to  "keep  this  in  mind  when 
writing  programs”  is  given.  An  explicit  framework  for 
achieving  such  optimality  is  provided  by  statistical 
decision  theory.  The  possibility  of  applying  decision 
theoretic  methods  to  traditional  Artificial  Intelligence 
planning  systems  inspired  the  present  work. 


The  implication  is  the  use  of  a  numerical  measure  of 
worth  (utility)  to  make  the  decisions  involved  in 
producing  a  plan.  This  idea  is  not  new  nor  peculiar  to  a 
decision  theoretic  approach.  The  best  known  search 
algorithm  of  Artificial  Intelligence,  A* ,  has  this  fora, 
as  does  mini-max  game  playing  and  the  theorem  prover 
MULTIPLE. 


Work  of  a  specifically  decision  theoretic  motivation 
is  much  less  common.  Significant  examples  include  the 
PERCY  simulation  of  Jacobs  and  Kiefer  [6],  the  JASON  robot 
of  Coles  et  al.  [  1 ],  and  particularly  the  work  of  Feldman 
[ 4 , 3  j .  Feldman* s  work  especially  was  an  inspiration  to 
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this  thesis,  particularly  in  the  application  of  decision 
theory  to  a  STRIPS -like  planning  system.  This  related  work 
is  reviewed  in  Chapter  2. 

A  very  desirable  feature  of  a  planning  system  is  the 
ability  to  back  up,  that  is,  to  reconsider  an  alternative 
previously  bypassed,  when  the  initial  choice  fails  (or 
merely  becomes  less  attractive) .  This  is  dealt  with  only 
in  cursory  fashion  by  the  other  workers.  Feldman  deals 
only  with  outright  failure,  and  this  only  in  passing. 
Including  reconsideration  explicitly  in  a  planning  tree 
(as  Coles  et  al.  [  1 ]  do  with  JASON)  quickly  complicates 
the  tree.  For  example,  consider  a  simple  planning  tree 
with  two  alternative  plans: 


_ 3  OR  i _ 

3  \  S  I 

i  I 

PLANA  PLANE 


If  PLANA  has  a  cost  of  4  and  PLANE  of  6,  PLANA  is  the 
obvious  choice.  Suppose,  however,  PLANE  really  consists  of 
two  steps,  the  initial  step  costing  1  (after  which  we  may 
reconsider)  and  specifying  the  cost  of  the  second  step  as 
10  (with  probability  .5)  or  0  (with  probability  .5).  Then 
while  the  expected  cost  of  PLANE  is  still  6,  it  is  not  the 
best  choice.  A  complete  decision  tree  (Figure  1)  shows 
why;  if  it  results  that  step  two  of  PLANE  will  cost  10,  we 
reconsider  PLANA  and  choose  it,  at  a  cost  of  only  4.  Thus 
allowing  for  reconsideration,  PLANB  has  an  expected  cost 


' 


. 

. 


- 

. 


3 


RECONSIDER 


\ 


FIG.  1 


RECONSIDERATION 


- 


4 


of  only 

3. 

This  is  treated 

more 

fully  in 

Section  3.2. 

If 

an 

alternative  may 

have 

more  tha 

n  two 

steps,  that 

is,  more 

than  one  point 

where 

more  acc 

urate 

knowledge  of 

remaining  cost  is  acquired,  things  grow  exponentially  more 
complex.  Compounding  this  with  already  complex  planning 
trees  is  clearly  infeasible  in  general.  An  approximation 
is  needed,  to  allow  us  to  include  the  significant  effects 
of  reconsideration  without  such  complexity.  The 
approximation  investigated  here  limits  an  alternative  to 
two  steps,  for  the  purpose  of  making  a  choice.  Thus  the 
first  order  effects  of  reconsideration  will  be  handled 
properly,  hopefully  yielding  adequate  decision  making 
behavior  when  the  higher  order  effects  are  small. 

Such  drastic  forward  pruning  would  be  expected  to 
give  severe  degradation  of  performance  in  a  game-playing 
environment.  However,  a  planning  structure  should  be 
designed  to  make  the  most  significant  information  the 
first  available;  this  should  limit  the  degradation  in  the 
planning  environment.  In  fact,  for  some  problems,  one 
level  of  expansion  is  totally  adequate;  such  an  example  is 
discussed  in  Section  3.1. 


This  results  in  a 
are  charact erizea  as 
has  an  expected  cost  te 
information  about  exp 
cost,  termed  t1  and  pro 


planning  system  where  alternatives 
two  step  processes.  The  first  step 
rmed  s,  after  which  more  precise 
ected  remaining  cost  (second  step 
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become  available.  These  refined  estimates  obey  a  bivariate 
distribution  function  f(t*,P®)«  The  numerical 
characterization  of  an  alternative  consists  then  of  the 
number  s  and  the  distribution  f,  This  is  treated  fully  in 
Section  3.3.  In  a  practical  implementation ,  some  numerical 
approximation  of  the  distribution  function  flt*,P4)  must 
be  used.  Possibilities  are  discussed  in  Section  3.4. 

The  control  structure  then  would  be  like  that  of 
MULTIPLE.  As  expansion  of  a  top  node  produces  a  new 
characterization  for  the  tip  process,  this 
characterization  is  "backed  up"  through  the  ancestral  AND 
and  OR  nodes  to  the  root;  choices  along  the  way  are 
re-evaluated.  How  this  backing  up  is  accomplished  is  the 
subject  of  Section  3.5, 

The  proposed  characterization  approximates  ail 
processes  by  two  step  processes.  Multi-step  processes  may 
be  partitioned  into  two  steps  in  many  ways.  Criteria  for 
choosing  a  partitioning  are  discussed  In  Section  3,6, 

It  is  possible  for  processes  invoked  in  planning  to 
run  on  forever  in  an  attempt  to  achieve  a  goal,  which  may 
result  in  the  expected  cost  of  the  process  being  infinite. 
This  may  be  so  even  of  a  desirable  process;  for  example, 
one  which  succeeds  with  zero  cost  99  per  cent  of  the  time, 
and  goes  on  forever  the  remaining  1  per  cent.  These  cases 
are  covered  in  Section  3.  7. 
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The  proposed  one  step  look-ahead  decision  method  is 
worked  out  in  a  detailed  example  in  Chapter  4.  A 
STEIPS-Iike  robot  is  used.  In  making  the  descent  from  the 
lofty  theoretical  heights  of  Chapter  3  to  the  real  world, 
considerable  simplification  is  necessary,  the  principal 
one  being  to  use  an  estimate  for  probability  of  success 
which  is  uniformly  1.  Thus  the  distribution  of  revised 
cost  flt*,p*)  becomes  a  simple  one  dimensional  function 
f{t8).  Execution  of  the  proposed  planning  system  is 
elaborated  fully  by  hand  for  several  example  goals.  These 
demonstrate  the  advantage  of  the  proposed  look-ahead 
method  over  simple  expected  cost  methods. 

While  even  with  these  simplified  examples  the  worth 
of  the  proposed  planning  method  remains,  a  number  of 
details  of  implementing  the  full  proposal  are  unspecified, 
and  remain  open  questions.  These  difficulties  and 
possibilites  are  discussed  In  the  concluding  chapter. 


CHAPTER  2 


RELATED  WORK 


2*_I  MULTIPLE 

Previous  systems  have  endeavored  to  use  quantitative 
measures  of  worth  to  guide  planning.  Of  these,  the  one 
most  directly  related  to  the  current  work  is  Slagle's 
MULTIPLE  [12J.  Here,  probability  of  success  was  the  sole 
measure  of  worth,  and  remaining  cost  was  approached  only 
indirectly,  as  rate  of  change  of  probability  with  effort. 
The  big  advantage  is  that  probability  can  be  "backed  up" 
in  a  natural  way  over  the  logical  connectives  AND  and  OR 
in  a  problem  reduction  tree.  Logical  extension  allows  the 
same  for  the  derivatives 

P[  X  AND  Y  ]  =  P£  X  ]P[  Y  ] 

P[X  OR  Yj  =  1  -  i1-P[XJ)  l1-P[Y]) 

_d_P[  X  AND  YJ  =  Pf  X  1  d  Pf  Y  1  +  Pf  Y  1  d  PC  X  1 
at  dt  dt 

_d_P[X  OH  YJ  =  1 1  -  Pf  X  J)  _d_P[  Y  J  +  <1-P[  Yj)_d_P[X  j 

dt  dt  dt 

where  X  and  Y  are  subgoals.  This  is  also  done  for  change 
of  goal  probability  for  effort  T  invested  in  subgoal  X 
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d  PfX  AND  Y]| 

d  PfX  AND  Y] 

_d_  p[X3 

dT 

= 

dP[  X] 

dT 

X 

X 

d  P[  X  OR  Y  ]  | 

P[X] 

dT  =  i  1  -PC  Y  j) 

dT 

X  j 

X 

The  merit  of  working  on  any  subgoal  was  defined  as  the 
rate  of  change  of  top-goal  probability  for  effort  invested 
in  that  subgoal.  The  (self)  merits  and  probabilities  of 
tip  subgoais  were  he ur istically  evaluated  (actually 
estimated  by  a  Learning  Program,  but  this  is  not  relevant 
here) .  The  merit  of  higher  level  nodes  was  simply  the 
maximum  merit  among  successors,  since  the  "choice”  subgoal 
to  attack  next  would  be  that  of  greatest  merit.  At  each 
elaboration,  these  parameters  of  probability,  merit,  and 
choice  could  be  backed  up  to  the  top  node  by  considering 
only  those  nodes  on  the  path  to  the  node  elaborated,  and 
their  sibling  nodes.  Thus,  the  cost  of  evaluating  a  node 
grows  only  linearly  with  the  depth  of  the  node;  this  is  a 
big  advantage  of  MULTIPLE,  and  a  feature  desirable  to 
retain  in  any  extension  of  the  system. 

However,  this  is  intimately  associated  with  a  most 
frequent  criticism  of  MULTIPLE;  the  assumption  of 
independence  among  subgoals.  The  naivete  of  such  an 
assumption  about  conjoined  subgoals  in  a  robot  problem 
solver  has  been  more  than  adequately  demonstrated  by 
Sacerdoti  [8]  in  ABSTRIPS  and  Siklossv  [10]  in  LAWALY. 
Here,  the  idea  of  "remaining  freedom”  (Siklossy) 
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determines  the  ordering  of  conjuncts.  This  points  up  a 
significant  difference  between  the  dynamic  world  of  a 
robot  problem  soiyer,  and  the  static  theorem  proving  model 
which  is  the  basis  of  MULTIPLE.  Contrary  to  the  notion  of 
PLANNER,  there  is  a  major  difference  between  proving  a 
proposition  and  achieving  a  goal.  The  difference  arises 
because  achieving  a  subgoal  may  alter  the  state  of  the 
world  relevant  to  achieving  (or  proving)  the  next  subgoal. 
Because  working  on  ?tearliern  subgoals  increases  the 
specificity  of  later  subgoals,  problem  reduction  leads  to 
highly  interdependent  subgoals.  Thus  the  derivative  of 
probability  (and  cost)  of  a  subgoal  with  respect  to  work 
performed  on  other  subgoals  is  (generally)  nonzero.  A  good 
example  is  the  goal  ((ROBOT  INROOM  D)  AND  (B0X1  NEXTG 
B0X2) )  discussed  in  relation  to  STRIPS  in  Section  2.6.1. 

There  are  ways  of  limiting  the  action  of  MULTIPLE  or 
similar  strategies  to  avoid  some  of  these  interdependence 
problems;  these  are  discussed  elsewhere  in  relation  to  the 
application  of  my  proposal.  However,  there  is  another 
problem  with  MULTIPLE  in  a  dynamic  environment.  Consider  a 
world  in  which  any  legal  goal  is  achievable.  It  is  still 
possible  in  such  a  world  for  some  solutions  to  be  shorter 
than  others,  or  easier  to  find:  we  can  do  better  than 
random  choice.  How  can  MULTIPLE  help  us  solve  problems 
efficiently?  The  merit  of  every  subgoal  will  be  zero,  that 
is,  MULTIPLE  provides  no  guidance  at  all.  Clearly, 
investment  of  effort  can  have  salutary  effects  other  than 
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altering  probability  of  success.  Ignoring  this  fact  is 
perhaps  the  greatest  deficiency  of  MULTIPLE. 

2. 2  PERCY 

The  evolution  of  a  planning  process  toward  a 
particular  goal  can  be  modelled  as  a  game  against  the 
"environment1* .  The  planner  makes  decisions  among  a  set  of 
possibilities.  At  alternate  moves*  the  "environment" 
specifies  outcomes*  much  like  an  opponent  in  a  chess  game 
except  that  this  opponent  is  random  rather  than  fighting 
back.  This  model  has  been  explored  by  Jacobs  and  Kiefer 
[6]  with  their  PERCY  simulation,  PERCY  is  a  simple'*  minded 
insect  with  a  single  task:  building  a  nest.  In  PERCY’S 
world*  there  are  a  total  of  only  12  states  (termed 
outcomes) *  and  for  each  outcome*  there  is  specified  a  set 
of  legal  decisions  (cardinality  <  4)  which  PERCY  can  make. 
There  is  a  utility  measure  on  the  outcomes*  and  a 
probability  distribution  over  the  set  of  outcomes  for  each 
legal  outcome-decision  pair.  These  utility  distribution s 
are  learned  by  a  Bayesian  update  of  estimates. 

A  complete  optimal  decision  method  would  be  to 
generate  a  complete  tree  of  all  possibilities  to  the  tip 
nodes  of  task  completion*  where  the  utilities  of  outcomes 
are  trivially  known.  Then*  the  utilities  of  higher 
outcomes  and  decisions  can  be  calculated  by  backward 
induction:  the  utility  of  an  outcome  is  the  maximum  of  the 
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utilities  of  successor  decisions,  and  the  utility  of  a 
decision  is  the  expected  value  of  the  successor  outcomes. 
This  is  in  contrast  to  the  mini-max  backup  of  usual  game 
playing;  the  authors  point  out  that  mini-max  backup  is  a 
special  case  where  the  probability  distribution  of 
outcomes  is  such  that  the  minimum  utility  has  probability 
1.  In  PERCY’S  world,  the  tree  is  infinite,  so  this  cannot 
be  done;  but  like  other  situations,  the  planner  in  fact 
expands  the  tree  to  a  limited  depth,  and  uses  utility 
estimates  at  tip  nodes.  In  PERCY 3  s  case,  the  depth  of 
expansion  is  one  --  only  a  single  step  of  look-ahead.  That 
is,  it  evaluates 

u(d)  =  ^>  P[  outcome!  d]  u  {outcome) 
outcomes 

the  expected  utility  of  decision  d,  for  all  decisions,  and 
makes  the  decision  of  maximum  utility. 

The  tremendously  simplified  type  of  problem  solver 
generated  to  illustrate  this  approach  shows  its 
deficiencies.  It  is  a  state  space  formulation  {as  opposed 
to  problem  reduction) ,  and  only  a  simple  world  can  be 
modelled  by  a  workable  number  of  states.  Like  MULTIPLE  or 
A*,  PERCY  relies  on  complete  elaboration  of  a  goal  to  make 
Its  utility  estimates.  Despite  these  simplifications,  it 
seems  that  one  level  of  look-ahead  is  all  that  is 
practical.  However,  the  adequacy  of  such  a  myopic  strategy 
is  demonstrated  on  PERCY’S  tasks  which  usually  require 


, 
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many  steps  --  another  example  of  locally  determined 
behavior  satisfying  global  goals. 


2 . 3  JASON 


Coles  et  al.  [  1  j  have  incorporated  decision  analysis 
methods  in  their  JASON  robot.  They  note  the  connection 
with  Feldman’s  work: 


In  a  more  recent  paper  by  Feldman  and  Sproull 
this  same  problem  (the  monkey  and  bananas 
problem)  has  served  as  the  basis  for  a 
decision- theoret ic  approach.  Although  they  are 
much  more  concerned  with  the  use  of  a  numerical 
utility  function  during  planning  to  develop  a 
more  efficient  strategy,  we  have  independently 
reached  the  same  conclusion  regarding  the 
positive  value  of  joining  decision  analysis  with 
a  symbolic  robot  problem  solver  to  facilitate 
intelligent  decision  making  under  conditions  of 
uncertainty.  ...  In  particular,  using  this 
approach,  we  can  create  a  decision  tree  that 
allows  one  to  "roll  back"  the  consequences  of 
further  information  gathering  operations  in 
comparison  with  direct  action. 


In  JASON’s  case,  "information  gathering"  is  limited  to 
exploration  of  the  external  environment,  and  does  not 


allow  for  elaboration  of  new  information  through  planning. 
An  even  more  serious  deficiency  is  revealed  by  the 


assignment  of  costs: 

Now  in  this  formulation,  although  "thinking"  is 
assumed  to  be  free  of  charge,  every  operation 
JASON  can  carry  out  in  the  real  world  is  energy 
(cost)  consuming. 

Of  course,  when  thinking  is  free,  one  can  use  classical 
decision  theory  directly:  consider  ail  consequences  of  all 
decisions,  and  compute  the  utilities  by  backward 
induction.  This  is  what  JASON  does.  Fortunately,  for  the 


. 

. 


I 
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world  and  goal  discussed,  the  resulting  tree  has  only  46 
nodes,  a  direct  consequence  of  the  simple  formulation.  In 
a  more  realistic  world,  of  course,  the  complexity  of  this 
method  explodes  hopelessly;  thinking,  no  matter  how  cheap, 
can  never  compensate  for  exponentially  proliferating 
alternatives  in  an  exhaustive  search.  So  while  these 
authors  have  explored  some  of  the  advantages  of  decision 
theory,  they  have  avoided  a  central  question:  how  to  plan 
optimally,  taking  into  account  the  cost  of  planning. 
Precisely  this  question  prompted  the  present  work. 


2  A* 

The  idea  of  using  a  measure  of  remaining  cost  to 
guide  problem  solving  search  is  certainly  not  new;  the  A# 
algorithm  of  Hart,  Nilsson,  and  Raphael  [7]  is  a  prime 
example.  As  first  introduced,  it  uses  a  state-space 
representation  of  the  problem  solving  environment. 
However,  extension  to  a  problem  reduction  framework  is  not 
difficult,  and  has  also  been  pursued  by  Nilsson  [7j.  Major 
differences  are  the  inclusion  of  AND  nodes,  and  a 
conceputal  reversal  to  proceed  backwards,  producing 
reduced  goals  from  previous  ones  until  they  are  trivially 
satis  fied . 


Further,  in 
identify  a  set 
idea  of  a  "state" 


a  STRIPS-like  world,  it  is  easy  to 
of  predicate  calculus  assertions  with  the 
„  However,  a  difficulty  results  because  a 


■ 
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goal  is  not  a  simple  state,  but  a  set  of  states  satisfying 
the  goal  description.  While  there  is  no  theoretical 
problem  in  introducing  suitable  equivalence  classes,  a 
practical  problem  arises  in  devising  suitable  heuristic 
cost  estimates.  This  is  precisely  the  problem  which 
surfaces  in  the  snowshovel  search  (Section  3.1):  we  need 
cost  estimates  for  parameterized  plans. 


There  is  another  problem  with  direct  application  of 
A*.  It  proceeds  by  expanding  a  node  fully,  then  choosing  a 
minimum  cost  successor.  The  inadequacy  of  such  an  approach 
is  obvious  when  the  number  of  successors  is  large,  or 
infinite.  Full  expansion  is  unfortunately  a  necessary 


feature  of  A* ,  since 


is 


essential  to  A*’s 


"admissibility” .  Rather,  this  full  expansion  is  a 
necessary  result  of  the  admissibility  requirement  that 
A  * 3  s  heuristic  underestimate  true  cost.  In  the  case  of  a 
parameterized  goal  where  cost  may  have  a  continuous 
distribution,  the  probability  of  any  instance  achieving 
the  minimum  is  zero,  so  we  expect  full  expansion  to  take 
place.  Clearly,  we  must  abandon  admissibility  in  favor  of 
optimality.  My  proposal  can  be  viewed  as  an  extension  of 
A*  in  the  sense  of  using  a  heuristic  to  evaluate  remaining 
cost.  However,  a  simple  remaining-cost  measure  is 


inadequate,  as  I  argue  in  Section  3.2. 


■ 

' 

/ 

-  ’ 


15 


2  , 5  Feldman 1 s  work 


2,5,1  Image  process ing 


Advocacy  of  applying  decision  theory  to  problem 
solving  in  Artificial  Intelligence  is  rare:  the  work  of 
Feldman  is  an  exception  and  mast  take  much  credit  for 
inspiration  of  the  current  work.  Arguments  in  favor  of 
applying  decision  theory  to  Artificial  Intelligence  are 
cited  by  Feldman  and  Yakimovsky  [4], 


•  •  • 


choice  between  alternative  courses  of 
action  is  often  inherently  numerical.  One 
chooses  the  cheapest,  fastest,  strongest,  etc. 
alternative. 


A  related  point  is  the  prevalence  of 
probablistic  judgements  in  the  world. 


.  .  .  ’comparing  the  incomparable’ .  If  flying  is 
faster  and  safer  than  driving,  but  more 
expensive  and  subject  to  delay,  how  can  we 
choose  which  to  do? 


In  the  absence  of  a  OF  [utilities  function],  a 
heuristic  program  apparently  must  have  rules 
covering  all  possible  combinations  of  goals  and 
circumstances.  The  addition  of  new  entities  will 
require  significant  reprogramming. 
.  .  .  Decision  theory  provides  a  uniform  way  of 
treating  information  related  to  choosing  a 
course  of  action,  given  the  relevant  utility  and 
probability  values. 


I  find  this  a 
against  procedural  e 
unfortunately,  not 


most  convincing  argument,  an  argument 
mbedding  of  decision  theory.  It  is. 


applied  in  their  picture  partitioning 


. 


. 


' 


■ 
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system. 

The  other  basic  method,  [other  than  hierarchical 
planning]  for  planning  is  to  attempt  to  measure 
progress  toward  a  goal  and  to  mix  planning  with 
action.  Obviously  enough,  a  uniform  measure  of 
progress  toward  a  goal  involves  a  UF. 

This  view  of  a  utility  function  is  in  the  spirit  of 
A* ,  but  I  shall  argue  that  it  must  be  combined  with 
hierarchical  planning  to  be  practical. 

Feldman  and  Yakimovsky  [4]  describe  application  of 
decision  theory  to  the  problem  of  segmenting  an  image  into 
meaningful  regions.  The  goal  was  a  front-end  region 
analyzer  (therefore  fast  and  unsophisticated)  which  was 
still  independent  of  absolute  criteria  (example:  boundary 
strength)  .  The  chosen  utility  function  was  the  probability 
of  a  correct  interpretation;  (an  interpretation  is  a 
partition  and  a  labelling  of  each  region) .  Regions  are 
"grown”  by  merging  2  adjacent  regions  at  each  iteration; 
basic  decisions  are  which  regions  merge,  and  when  to  stop. 
At  each  iteration,  upper  and  lower  bounds  on  the 
probability  (utility)  are  computed.  When  these  begin  to 
decrease,  the  system  stops  growing  and  assigns  an 
interpretation  to  the  partition  of  highest  utility. 


Despite 

all 

the  talk 

of  decision  analysis. 

this 

system  dees 

not 

seem  to 

follow  the  usual  form 

of 

elaborating 

successions  of 

alternating  decisions 

and 

probablistic  outcomes.  No  backing  up  of  utilities  is  used. 
Rather,  the  utility  function  idea  is  used  as  an 


. 


. 


’ 


. 
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inspiration  for  an  A*-like  heuristic  of  remaining  cost 
(actually  the  inverse  of  remaining  cost;  the  goal  is 
achieved  when  probability  is  maximized  rather  than 
remaining  cost  minimized) .  The  problem  representation  is 
certainly  of  the  state-space  variety:  we  are  concerned 
with  transitions  involving  merging  two  regions.  So,  while 
the  arguments  the  authors  cite  in  favor  of  using  a 
decision  theory  approach  are  convincing,  their  example 
does  not  demonstrate  an  explicit  application, 

2^ 5^2  Monkey  and  bananas  problem 

Feldman  and  Sproull  [3]  have  investigated  many  of  the 
aspects  of  applying  decision  theory  to  problem  solving 
through  consideration  of  a  robot  symbolic  problem  solver. 
They  discuss  a  STRIPS  type  robot  solving  the  monkey  and 
bananas  problem,  giving  a  solution  like  WALKTQ  (BOX)  , 
PUSHTO  (BOX, BANANAS)  ,  CLIMB  (BOX),  CONSUME  (BANANAS)  .  They 
point  out  that 

,  .  .  the  splan'  itself  is  trivial.  ...  It  is 
the  application  of  this  plan  to  the  world 
situation  which  is  difficult.  We  believe  that 
much  intelligent  activity  is  characterized  by 
complex  applications  of  simple  plans  and  this 
belief  has  led  us  to  concentrate  on  the  closely 
related  questions  of  plan  elaboration  and 
execution. 

In  this  context,  plan  elaboration  is  the  choice  among 
several  alternative  instantiations  of  BOX.  The  approach  is 
to  evaluate  the  utilities  of  the  alternatives,  and  select 
the  one  of  highest  utility: 


.  t 


. 


I* 

. 


Because  the  utility  of  a  plan  can  be  used  to 
compare  the  merits  of  competing  plans,  it  can  be 
used  to  guide  a  search  for  good  plans.  The  basic 
idea  is  to  search  by  expanding  paths  of  greatest 
expected  utility. 

I  criticize  this  approach  elsewhere  in  this  thesis. 

The  authors  are  not  unaware  of  the  inadequacy  of 
using  expected  cost  only;  they  talk  of  using  bounds  on 
cost  to  prune  planning  trees  (an  extension  of  A#*s  lower 
bounding) .  This  approach  also  has  obvious  shortcomings. 

They  consider  the  possibility  of  planning  to  recover 

from  various  "failures”,  and  point  out  that 

This  process  can  be  carried  on  indefinitely,  but 
.  »  .  the  cost  of  additional  planning  may  exceed 
the  slight  improvement  in  expected  utility. 

A  similar  issue  is  the  insertion  of  information  gathering 

steps,  and  in  general  any  kind  of  "procedural"  planning, 

where  some  decisions  must  be  made  at  execution.  Instead  of 

elaborating  a  plan  for  failure  recovery,  they  propose 

...  an  estimate  of  the  utility  of  fixing  up 
the  failure  is  the  current  utility  measurement 
of  the  top-level  alternative. 

I  suggest  use  of  this  substitution  of  alternatives  in 
situations  other  than  outright  failure.  While  the  authors 
note  this  interdependence  of  utilities  ("  .  .  .  the  vision 

and  action  planning  activities  each  make  use  of  the 
other *s  estimates.")  they  do  not  comment  on  resolving  the 
implicit  deadlock.  Rather,  they  merely  note  the  more 
general  problem: 


. 
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. 
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Unfortunately,  specifying  a  utility  function 
that  reflects  the  benefits  of  future  planning 
[the  aim  of  this  thesis]  is  quite  difficult.  In 
fact  this  is  the  classic  'cost  of  analysis' 
problem  in  decision  theory:  if  a  decision 
analysis  must  include  the  cost  of  the  analysis, 
we  are  led  to  a  recursive,  unterminating 
computation:  what  is  the  cost  of  analyzing  the 
cost  of  analysis?  Thus  it  appears  'impossible* 
to  compute  the  cost  of  decision  analysis. 

Fortunately,  this  does  net  make  the  implementation  of 

decision  analysis  "impossible”. 

2X6  The  STRTPS-iike  problem  solvers 

An  environment  for  problem  solving  has  been  developed 
which  illustrates  many  of  the  general  difficulties,  yet  is 
sufficiently  simple  to  be  tractable.  This  is  the  blocks 
world  of  STRIPS,  ABSTRIPS,  and  LAWALY.  The  world  is 
described  by  assertions  in  a  predicate  calculus  format  (or 
equivalently  a  semantic  network) ;  this  paradigm  holds 
greatest  promise  of  a  uniform  representation  of  knowledge. 
The  basic  world  consists  of  a  robot  in  a  building  with  a 
number  of  rooms  and  doors;  this  basic  shell  is  usually 
populated  with  other  objects:  boxes,  lightswi tches,  keys, 
etc.  Any  operator  available  to  the  robot  is  described  as 
an  ODerator  schema:  generally,  a  statement  of 

preconditions  describing  worlds  to  which  the  operator  can 
be  applied,  and  specifications  of  transformation  s 
describing  the  corresonding  resulting  worlds.  The 
implementations  specify  transformations  by  delete  and  add 
lists,  of  assertions  to  be  deleted 


from  and  added  to  the 
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world  model. 

Many  classical  Artificial  Intelligence  problems  can 
be  cast  in  this  model.  For  example,  Feldman  gives  a 
representation  of  the  monkey  and  bananas  problem.  This  is 
the  framework  in  which  the  current  proposal  was  conceived 
to  operate. 

2.6,1  STRIPS 

The  earliest  of  the  above  systems  was  STRIPS  [5]. 
Ignoring  its  use  of  resolution  principle  for  inference, 
STRIPS  regards  an  operator  relevant  to  a  goal  if  its  add 
list  includes  one  of  the  desired  goal  literals.  That 
literal  is  then  replaced  by  the  precondition  (s)  of  the 
operator,  expanded  until  the  subgoals  thus  generated  are 
true  in  the  initial  world.  When  this  happens,  the  operator 
is  applied  in  the  sense  that  the  world  description  is 
updated;  this  new  world  is  the  "initial”  world  for  other 
planning  branches.  This  depth  first  search  behavior  is  not 
explicitly  required  of  STRIPS;  it  is  merely  the  usual 
result  of  the  method  used  to  select  the  next  goal  literal 
to  be  expanded.  However,  if  search  is  not  depth-first, 
considerable  wasted  effort  may  result.  For  example, 
consider  the  world  of  Figure  2,  and  the  goal  (  {ROBOT 
INROOM  D)  AND  (BOX1  NEXTO  BOX  2)),  Clearly,  if  the 
planning  to  achieve  the  two  goals  proceeds  in  parallel, 
the  plan  to  go  from  room  C  to  room  D  (at  least  as  far  as 


. 
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FIG.  2  DUPLICATION  POTENTIAL  OF  STRIPS 
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room  E)  will  be  generated  twice:  both  conjuncts  seek  a 
path  from  the  same  initial  world.  However,,  transformations 
must  be  made  sequentially;  the  initial  world  for  the 
second  conjunct  is  always  the  resultant  of  the  achievement 
of  the  first.  Even  if  it  is  known  that  the  first  conjunct 
must  be  achieved  before  the  second,  we  may  have  a  problem 
in  planning  the  second  conjunct  before  planning  the  first: 
the  world  resulting  from  achieving  the  first  conjunct  is 
incompletely  specified.  And  there  is  a  further  constraint 
on  the  second  conjunct  it  must  not  undo  the  first. 

These  problems  are  handled  differently  (but  with 
similar  results)  by  two  refinements  of  STRIPS :  ABSTBIPS 
and  LAWALY, 

2^ 6^2  ABSTBIPS 

ABSTBIPS  [8]  orders  conjuncts  by  assigning 
”cr it icality  levels”  to  each  literal.  It  then  solves  the 
problem  completely  at  each  level,  by  ignoring  literals  of 
lower  criticality.  This  ignoring  of  details  is  termed 
"abstraction",  and  the  whole  process  termed  "hierarchical” 
planning.  Because  it  avoids  obviously  false  steps, 
ABSTBIPS  is  about  an  order  of  magnitude  faster  than 
STRIPS,  and  can  solve  correspondingly  more  complex 
problems. 

r 

These  are  somewhat  misleading  terms  for  so  simple  an 
operation  as  throwing  out  details. 


Indeed,  abstraction  and 


•V. 
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hierarchy  do  not  in  general  reduce  to  ignoring  detail. 
Rather,  they  seem  to  correspond  more  closely  to 
transitions  between  whole  systems  of  operators  and 
concepts.  For  example,  STRIPS  and  other  robot  problem 
solvers  work  with  higher  level  operators;  given  a  plan  as 
a  sequence  of  such  operators,  for  example  GOTO  (BOX  1 )  , 
PUSHTO  (30X2) ,  the  real  robot  must  translate  this  into  a 
lower  order  plan  to  operate  stepping  motors  on  its  wheels 
or  whatever.  Sacerdoti3s  later  work  on  NOAH  [9]  gives  a 
better  general  framework  for  hierarchy  and  abstraction. 

2^6^_3  LAWALY 

LA  WALT  [10]  demonstrates  that  the  same  kind  of 
ordering  of  operators  can  be  obtained  in  a  conceptually 
more  straight-forward  manner.  For  example,  given  the  world 
of  Figure  3,  and  the  goal  {{B0X1  INROOM  A)  AND  (ROBOT 
IN ROOM  C) )  it  is  clear  that  (BOX  1  INROOM  A)  should  be 
achieved  first,  Siklossy  characterizes  this  as  remaining 
freedom:  protecting  the  assertion  of  some  literals 
prevents  the  achievement  of  others.  In  this  example,  once 
the  robot’s  position  is  fixed,  it  is  no  longer  free  to 
move  boxes. 

Conjoined  literals  can  be  ranked  by  achieving  last 
that  conjunct  most  easily  achieved,  assuming  the  others 
are  already  true.  By  thus  ordering  conjuncts  {done  for 
preconditions  only  once,  in  advance) ,  avoiding  inference 
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(which  I  haven *  t  observed  to  aid  STRIPS  in  any  meaningful 
way) ,  and  using  a  domain  specific  algorithm  to  find  routes 
between  rooms*  instead  of  blind  search*  LAWALY  achieves 
speeds  about  two  orders  of  magnitude  faster  than  STRIPS* 
and  can  solve  problems  requiring  hundreds  of  steps. 


’ 


CHAPTER  3 


LOOK-AHEAD  DECISIONS 

3 . 1  A  Sequential  Search  Example: 

The  Monkey  and  Bananas  Problem,  a  la  Feldman 

OR 

How  I  Found  my  Snowshovel  in  the  Want  Ads 


To  illustrate  the  probelms  of  including  planning  cost 
in  the  cost  of  a  plan,  we  shall  examine  a  particular 
example  in  detail.  The  example  is  drawn  from  the 
formulation  of  the  monkey  and  bananas  problem  considered 
by  Feldman  and  Sproull  [3],  and  has  the  advantage  that 
theory  is  adequate  to  analyze  it  fully  (it  is  an  instance 
of  the  entrance  fee  problem  discussed  by  DeGroot  [21),  to 
allow  comparison  with  my  proposal  of  a  one  level 
look-ahead  approximation. 


Suppose,  as  in  Feldman,  that  a  monkey  has  obtained  a 
parameterized  solution  to  the  classic  culinary  problem, 
say  (following  Feldman)  : 


WALK  TO  (BOX) 

PUSHTO  (BOX , BANANAS)  />  PLANA  (BOX) 


CLIMB  (BOX) 

CONSUME (BANANAS) 

where  we  consider  BOX  to  be  a  parameter  to  be 
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instantiated.  Let  us  further  assume  that  the  cost  of 
executing  the  plan  is  proportional  to  the  distances 
between  the  monkey  and  the  box,  and  between  the  box  and 
the  bananas;  if  we  have  initially  the  monkey  at  the  same 
location  as  the  bananas,  these  are  the  same.  Now,  let  the 
monkey  and  bananas  be  in  the  center  of  a  circular  room  of 
radius  H,  and  containing  an  arbitrarily  large  number  of 
boxes,  uniformly  distributed.  This  could  be  represented  as 
an  OR  node: 


_ _  i  0  B  S _ _ _ _ _  .  .  . 

1  I. _ 1  !  ~ 

S  i  j 

PLANA  (30X1)  PLANA  |B0X2) 

However,  all  successors  cannot  be  enumerated.  This  is  the 
fatal  flaw  of  evaluation  functions  which  work  only  on 
fully  instantiated  plans.  What  we  really  need  is  something 
that  can  make  the  choices  one  at  a  time: 


i 

I 

PLANA (BOX  1 ) 


OR  f 

_ 3 


3 

PLANA  (BOX  j  not  BOX'!) 


Now,  if 
evaluate 
is  easy 
simply 


we  are  to 
the  costs 
for  the 


follow  Feldman* s  suggestion,  we  need  to 
(utilities)  of  the  alternatives.  This 
fully  instantiated  PLANA  (B0X1)  ;  it  is 


C  =  az 


1 
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where  z1  is  the  distance  of  B0X1  from  the  center  of  the 
room  and  a  is  the  constant  of  proportionality.  But  what 
about  the  cost  of  the  alternative?  Clearly,  this  depends 
on  the  decision  strategy  used.  For  example,  if  we  adopt 
the  strategy  of  evaluating  all  the  alternatives,  and 
choosing  the  least  expensive,  we  have  run-away  cost  if  we 
assign  any  non-zero  cost  to  enumerating  each  alternative. 
Let  us  assume  unit  cost  for  each  such  elaboration.  That 
is,  we  have  incurred  unit  cost  in  going  from: 

PLANA  (BOX) 


to : 


PL  ANA  (BOX  1)  PLANA  (BOX  |  not  BOX'S) 

Now,  to  choose  between  these  alternatives,  our  strategy  is 
to  choose  the  one  of  least  cost.  But  the  cost  of 
PLANA  (BOX  j  not  BOX  1 )  (like  the  cost  of  PLANA  (BOX) )  is 

dependent  on  our  strategy.  Our  reasoning  is  locked  in  an 

infernal  loop. 

Let  us  consider  a  more  realistic  problem  whose 
structure  is  identical  to  that  of  the  many-box  monkey  and 
bananas  problem.  Its  advantage  is  that  the  cost  of 
enumerating  alternatives  is  plausibly  and  simply  defined. 
Supoose  we  wish  to  purchase  a  used  snowshovel  through  the 

want  ads.  Living  in  the  center  of  a  large  city,  with  an 


f  * 


, 
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arbitrarily  thick  newspaper  (like  the  Edmonton  Journal) , 
we  assume  a  uniform  spatial  distribution  of  an  arbitrarily 
large  number  of  snowshovels-for-sale.  We  have  set  aside 
sufficient  cash,  and  valuing  our  time,  wish  to  obtain  a 
shovel  as  quickly  as  possible.  Having  located  a  shovel, 
the  time  it  takes  to  get  it  will  be  proportional  to  the 
distance  to  the  vendor.  We  assume  the  ads  include  no 
addresses;  locating  a  shovel  will  consist  of  finding  the 
next  shovel-ad,  telephoning,  and  determining  the  location. 
Assume  that  locating  the  next  shovel  takes  unit  time 
(cost),  and  that  the  city  has  unit  radius.  Now,  the 
distribution  of  shovels  by  distance  will  be  the  marginal 
f  (x)  of  a  uniform  density  function  g(x,@)  on  the  unit 
circle  (see  Figure  4).  Let  F(x)  be  the  cumulative 
distribution  function  corresponding  to  f  (x)  ,  At  any  time, 
we  may  stop  locating  shovels  and  go  buy  one;  obviously, 
the  minimum  cost  will  correspond  to  the  closest  one.  So, 
we  can  lock  at  our  search  as  a  process  defined  by  one 
parameter,  z,  the  distance  to  the  closest  shovel  found  so 
far.  At  the  next  step,  the  process  may  evolve  to  a  lower 
z,  distributed  like  fix),  or  stay  the  same  if  the  next 
shovel  is  no  closer  than  the  nearest  (see  Figure  5)  . 

Now,  our  decision  strategy  will  consist  of  a  stopping 
rule:  determining  those  values  of  z  for  which  we  should 
stop  the  process  and  obtain  the  shovel  at  distance  z. 
Clearly,  these  values  of  z  will  be  a  set  of  the  form  {z :  z 
<T}  (since  we  can  always  transform  a  lower  cost  plan  into 
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a  higher  cost  one  by  twiddling  our  thumbs) . 

We  can  calculate  the  expected  cost  of  such  a  decision 
strategy.  For  any  T,  let  the  probability  cf  moving  to  z  < 
T  on  any  step  be 


P 

T 


1 


f  |X)  dx  =  T2  . 


0 


So,  the  expected  number  of  steps  before  stopping  is 

oo  i- 1 

N  =  ^  iP  (1-P  )  =  _J. _ =  _ 

T  i=1  T  T  P  T2 

T 

Also,  we  need  the  expected  distance  of  the  successful 
shovel 


E[  z  \  z<T  J  = 


rT 

zf  (z)  dz 

£0 _ _ 

p 

T 


2T3 

3T2 


So  the  expected  cost  is: 

C  =  N  ♦  aE[ziz<Tj 
T  T 

=  +  2aT 

T2  3 

where  locating  a  shovel  has  unit  cost,  and  a  shovel  at 
unit  distance  has  a  cost  of  a.  Now,  the  optimal  T  will  be 
that  T 9  for  which  CT 


is  minimal. 


- 


' 
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_ _ d  C  =  ~2  +  2a 

dT  T  3 

1/3 

T*  =  (3/a) 

For  example,  assume  we  can  obtain  a  shovel  from  the 
outskirts  of  the  city  in  the  time  it  takes  to  locate  30 
shovels  -for -sale.  Then  a=30,  and  T}  =  \J{3/30)  =  .464  . 

That  is,  if  the  edge  of  the  city  is  10  miles  away,  we 
should  keep  locating  shovels  until  we  find  one  within  a. 64 
miles. 


Let  us  look  at  this  problem  in  a  different  light. 
Suppose  we  adopt  a  ’’myopic"  decision  strategy,  where  to 
evaluate  the  cost  of  continuing,  we  assume  we  will  locate 
only  one  more  shovel;  then  choose  this  shovel  or  the 
nearest  one  found  previously.  As  a  decision  tree: 


A 

1 

1 

SYL  1 


E 


1 

seek  a  nearer 


shovel 


Clearly,  the  cost  of  decision  A  is  CA  =  az]t  To  evaluate 
C 3 ,  we  consider  only 


i  B 
_1_ 

D _ |  OR  j 

5  _ 1 

I 

SYL2 


here  clearly  CE  =  CA  =  az, .  We  choose  alternative  D 


I 

I 

SVL1 


. 


p 


- 


iff  CD  <  CE#  so  to  evaluate  CB  we  need  P[CD<C,  j  and 


E[Cd 

1  Cd^Ce]‘ 

Th 

ese 

are  precisely 

the  PT  and 

E[  z 

3  z<T  j 

from 

above,  wi 

th 

T  =  z  , 

.  That 

is. 

P[  C  <  C 

J  = 

z2 

and 

D  E 

1 

E[  C  3  C 

<C 

j  = 

2az  /3 

D  D 

All 

1 

So, 

C  B  =  cost 

of 
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ating  S 

VL2  + 

(probability 

of  choosing 

D) 

*  (expected 
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SVL2, 

given  choi 

ce 

D)  + 

{probability 

of 

c 

hoosing 

T<  \ 

*  {cost 

of 

SVL1)  . 

Substit  uting. 

C  =  1  + 

a  [z 

z3/3) 

B 

1 

1 

Now, 

let  us  de 

ter 

mine 

under 

what 

conditions 

decision  ,  B 

should  be  made 

✓  t 

hat 

is,  for 

what  : 

is  CB  <  cA 

1  +  a  (z 

-  z 

3/3} 

<  az 

1 

1 

1 

1/3 

z  >  i3/a) 

1 

That  is,  we  will  stop  seeking  shovels  when  we  have 

1  /3 

located  one  which  is  as  close  as  {3/a}  7  ;  no te  this  is 

precisely  the  same  as  the  stopping  threshold  obtained 
by  a  full  analysis .  However,  it  is  obtained  by  looking 
only  one  ste£  ahead,  instead  of  considering  all  the 
possible  outcomes  of  an  infinite  number  of  steps. 


Apparently,  breaking  the  infernal  loop  mentioned  at  the 


. 
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outset  after  only  one  iteration  need  not  have  disastrous 
results. 

Such  serendipity  is  by  no  means  guaranteed  for  all 
problems,  of  course.  However,  looking  ahead  one  step  is 
always  better  than  not  looking  ahead  at  all.  I  propose 
that  a  suitable  structuring  of  problems  makes  such  a 
myopic  strategy  adequate:  that  structure  commonly  called 
hierarchal  planning.  In  fact,  making  local  decisions 
compatible  with  global  goals  can  be  viewed  as  the  entire 
objective  of  a  planning  system. 

3^2.  Complications  of  Reconsideration 

Choosing  among  alternatives  is  the  subject  of 
classical  decision  theory.  A  central  result  is  that  each 
alternative  may  be  evaluated  by  a  single- valued  util it y 
[ j j]  representing  the  expected  gain  resulting  from  its 
selection.  For  example,  in  Feldman’s  formulation  of  the 
monkey  and  bananas  problem,  suppose  a  box  could  be  heavy 
or  light,  heavy  boxes  costing  12  and  light  ones  8.  Suppose 
P[ heavy]  =  1/2.  Then  the  alternative  of  using  a  box  would 
have  expected  cost  (negative  utility)  of  (.  5*1  2)  +  { .  5*8)  = 

10.  This  is  represented  graphically  in  Figure  6a  ,  where 
the  round  node  indicates  an  externally  determined  choice 
(as  opposed  to  a  decision,  which  will  be  a  square  node) . 

Suppose  the  monkey  has  an  alternative,  say  of  asking 
a  nearby  man  "Hey  Mister,  can  you  fetch  me  the  bananas?" 


< 


. 


. 

. 

37 


The  man  may  respond  "Sure,  monkey"  (probability  1/2)  or 
"Yehr  but  it’ll  cost  you  40,  you  hairy  ape"  (Figure  6b) , 


Now  let  us  suppose  there  is  some  planning  cost,  or 
other  front-end  load,  for  each  alternative,  say  6  for  the 
box  (the  cost  of  walking  to  it  and  giving  it  a  trial  push) 
and  2  for  the  man  (cost  of  talking) .  Using  a  square  node 
to  represent  the  decision,  we  have  the  tree  of  Figure  6c. 
Usual  methods  let  us  calculate  the  expected  cost  of  the 
man  as  2+20  =  22,  and  of  the  box  as  6+10  =  16.  An  optimal 
decision  then  would  be  to  use  the  box,  and  the  decision 
node  would  acquire  this  cost  (as  shown) . 


However,  this  naive  application  of  Feldman* 
ignores  the  possibility  of  reconsidering  a  choic 
up) ,  something  any  successful  system  must  be  abl 
For  example,  if  the  man  says  "That’ll  be  ^0" 
need  not  spend  the  40  but  may  reply  "Get  real, 
use  the  box". 


s  proposal 
e  (backing 
e  to  do. 
the  monkey 
man.  I’ll 


This  introduces  a  second  level  of  decision  making. 
Similarly,  if  he  considers  the  alternative  at  the  second 
level,  he  is  faced  with  a  third  level  (trivial)  decision 
as  to  which  to  use.  Our  decision  tree  has  grown 
considerably  to  include  these  refinements  (Figure  7a).  Se 
now  see  that  considering  the  box  has  expected  cost  of  6+7 
=  13,  not  16  as  before;  the  cost  of  the  man  drops  even 
more  dramatically,  from  22  to  2+3  =  iu.  dost 
significantly,  we  now  see  that  we  expect  to  spend  fully  3 


I 

. 

. 


. 


' 


■ 


.  - 

. 


. 


.  '  ' 


36 


FIG.  6a  UTILITY  OF  BOX 


FIG.  6b  UTILITY  OF  MAN 


FIG.  6c 


NAIVE  DECISION  TREE 
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FIG.  7a  FULL  TREE  WITH  BACK  UP 


FIG.  7  b 


ONE  LEVEL  TREE 
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units  less  by  considering  the  man  first,  the  opposite 
decision  from  before. 

We  note  that  to  each  alternative  one  can  srill  attach 
a  single-valued  cost  (utility) ,  with  the  optimal  decision 
still  being  the  minimum  cost.  However,  the  cost  of  each 
alternative  depends  on  the  other. 

Clearly,  though,  elaboration  of  the  full  tree  of 
possibilities  is  a  computational  nightmare.  Since  the 


decision  theoretic 

op  timal 

decision  is 

based  on  such 

elaboration , 

the  str 

ategy  of 

doing 

such  a 

thing  in  the 

real  world 

(that 

is,  including 

the  cost 

of  "thinking") 

could  never  be  optimal. 


My 

proposal 

then  is  to  limit  this 

elaboration 

t  0 

one 

level . 

That 

is. 

the  second  level 

decisions 

would 

be 

handled 

in 

the 

"naive"  manner. 

excluding 

f  urt 

her 

reconsider at 

ion . 

The  tree  in  this  instance  becomes 

that 

o  f 

Figure 

7b. 

Rela 

tive  to  the  full  tree,  we  still 

make 

the 

optimal 

decision 

and  knew  its  correct 

cost. 

3^3  Generalized  representation 

What  sort  of  a  generalized  representation  of 
alternatives  does  this  lead  to?  The  distribution  of  costs 
after  an  ’'external  choice"  (in  general,  any  event  making 
available  new  information)  can  be  continuous  rather  than 
discr et e- valued  as  considered  above.  As  welx  as  >.he  cost. 


. 

.  I!  -  2  '• 


r 


uO 


our  estimate  of  probability  of  success  of  the  alternative 
may  be  revised* 

3  .  3 , 1  Need  for  probability  estimates 

Why  not,  as  Feldman  [3]  and  others  have  done,  ignore 
the  probability  of  success  of  an  alternative,  and  plan 
only  using  its  effect  on  utility? 

There  are  two  possible  formulations  of  a  utility 
measure:  as  a  positive  utility  to  be  maximized,  or  as  a 
cost  to  be  minimized.  While  in  theory  one  is  simply  the 
additive  inverse  of  the  other  {±  a  constant) ,  as  concepts 
they  lead  to  distinguishable  paradigms,  the  offset 
constant  becoming  infinite. 

It  is  natural  with  the  utility  paradigm  to  let  u,  the 
utility  of  failure,  be  0.  Feldman  arbitrarily  assigns  200 
units  of  positive  utility  to  achieving  the  goal  of  eating 
the  bananas,  and  an  unspecified  utility  to  not  eating 
(failure) ,  Suppose  now  the  monkey  has  available  a  plan  for 
feeding  costing  200  units,  with  probability  of  success 
1/2.  Clearly  the  (expected  )  utility  of  the  plan  is 
-200+  (.  5*200)  +  i  • 5 u >  =  . 5u- 1 00 .  Now  for  u  >  -200, 
u  >  . 5u- 1 00 ,  that  Is,  the  utility  of  failure  is  greater 
than  that  of  the  plan  for  feeding.  So,  the  monkey  will 
orefer  to  sit  and  starve  to  death  rather  than  trying  to 

eat. 


f 


. 


It  is  intuitive  to  demand  of  any  decision  maker  that 
any  attempt  to  eat,  no  matter  how  costly  or  unlikely  to 
succeed ,  is  preferable  to  failure  {starvation) .  This  is 
because  there  is  no  state  of  suspended  animation;  failure 
incurs  a  cost  that  increases  with  time  unboundedly.  Since 
we  are  forced  to  talk  of  negative  utilities,  it  is  more 
natural  to  adopt  the  cost  paradigm. 


With  costs,  it  is  intuitive  to  assign  a  "cost"  of  0 
to  success.  Then  let  £  be  the  cost  of  failure.  What  is  the 
expected  cost  of  an  alternative  not  guaranteed  to  succeed? 
Clearly  it  is  t+ ( 1 - p) f  where  t  is  the  expected  cost  to 
complete  the  alternative  and  reach  a  decision  of  success 
or  failure  {probability  of  success  p) .  Now  our  Intuitive 
demand  that  t+{1-p)f  <  f  for  any  permissible  values  of  t 
and  p  •  Implies  t  <  pf  or  f  >  t/p.  So  f  — >  oo  as  p  — >  0 
for  fixed  t.  We  cannot  simply  let  f  =  oo  however;  the 
expected  cost  of  any  alternative  not  guaranteed  to  succeed 
would  also  be  infinite.  Usual  decision  theory  demands  that 
costs  {utilities)  be  finite: 

...  in  an  ordinary  problem  there  is  neither 

3  heaven  *  nor  *  hell  *  in  R  [the  set  of  rewards]. 

[  2  ,  p .  103] 

But  failure  is  hell.  This  assumes  the  failure  is  a  "top 
level"  failure,  that  is,  one  to  which  there  is  no 
alternative  having  any  possibility  of  success. 


What  can  we  do?  In  practice,  we  only  want 
between  ’’non-f  ail  ure”  alternatives. 


to  chose 

a2)  ; 


say  (A  ,  OR 
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failure  can  never  be  considered  as  an  alternative.  What  is 
the  expected  cost  E1  of  A 1 ?  Clearly, 

E  =  t  +  (1-0  )t  +  (1-p  )  (1-p  )f 
1  1  12  *  1  2 

where  A,  and  A2  have  cost  and  probability  ^ 1 1 #  p  >  and 
{t2rP  )  r  respectively.  Similarly, 

E  =  t  +(1-p  )t  +(1-p  )  (1-p  )  f 
2  2  2  1  1  2 

and  we  chose  E,  if  E1  <  E2,  that  is,  if 

t  +  1 1  -  p  )t  +  (1-p  )  { 1  -  p  )  f  <  t  +  ( 1  -  p  )  t  +  ( 1  -  p  )  ( 1  -  p  )  f 

1  1  2  1  2  2  2  1  1  2 

t  -(1-p  )t  <  t  -(1-p  )  t 
1  2  12  12 

t  t 

_ 2_  <  _ _ 2_ 

P  P 

1  2 

Fortunately,  the  f  terms  cancel.  That  is,  we  need  not  be 

concerned  with  the  cost  of  ultimate,  top  level  failure  in 

making  decisions  based  on  expected  costs.  It  is  useful  to 
talk  of  the  cost  of  an  alternative  excluding  the 
possibility  of  failure.  In  the  above  example,  E,  would 
become  t1  +  (1-pi)t2,  and  similarly  for  E2.  Note  that  the 
‘•internal”  costs  t1  and  t2  are  also  of  this  type.  Ail 
costs  throughout  the  remainder  of  this  thesis  will  exclude 
the  possibility  of  failure,  and  will  simply  be  referred  to 
as  "cost”.  This  creates  no  problem  when  choosing  among 
alternatives ;  however  we  note  that  failure  cannot  be 
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handled  in  a  uniform  way,  as  merely  another  alternative 
(but  with  infinite  cost) .  We  note  now  that  cost  depends  on 
the  char acteristics  of  the  alternative  a2 ;  we  cannot 
evaluate  the  cost  (utility)  of  an  alternative  without 
knowing  the  other  alternatives  available.  Therefore,  we 
must  characterize  an  alternative  by  both  its  internal 
expected  cost  and  its  probability  of  success. 


3.3,2  Characterization  of  an  alternative 

Now  that  the  need  for  a  two  component 
characterization  of  an  alternative  has  been  demonstrated, 
how  shall  an  alternative  be  characterized  as  a  two  step 
process? 


The  first 
term  s.  After  t 
to  (ts  rp‘)  ; 
distr ibuition  f 
of  the  second  s 
expected  cost 
assumed  not  to 
of  generality, 
step  cost  of  0) 


step  will  have  an  expected  cost,  which  we 
his  step,  the  (t,p)  estimates  are  revised 
this  can  be  described  by  a  bivariate 
unction  f(t*,P*)«  The  (prior)  expected  cost 
tep  will  be  E(t*j,  so  then  the  total 
must  equal  s  +  E[t'].  Also,  the  process  is 
succeed  during  the  first  step  (without  loss 
since  success  is  equivalent  to  a  second 
/  so  p  =  ECp1  ]. 


For  example,  suppose  in 
problem  that  the  cost  of  pushing 
weight,  which  is  normally 
determined  by  giving  a  box  a  tria 


the  monkey  and  bananas 
boxes  is  proportional  to 
distributed  and  can  be 
1  push.  Then  if  it  costs 


. 
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6  to  walk  to  a  (particular)  box  and  give  a  trial  push, 
f(t')  will  be  normal.  Suppose  also  that  touching  a  box 
revises  our  estimate  of  its  climbability ;  then  p  will  also 
be  revised  (say  normal  also,  independent  of  t)  .  Then  the 
alternative  of  continuing  with  this  box  will  be 
characterized  by 

s  =  6  fit’rp1):  bivariate  normal 


3.3.3  Choosing  among  alternatives 

How  are  we  to  choose  between  alternatives  so 
characterized?  Our  one  level  look-ahead  criterion  presumes 
simple  decisions  one  level  ahead;  these  are  simply  those 
derived  in  Section  3.3.1,  namely  to  choose  A ]  if 

t  t 

_ 1_  <  _ 

P  P 

'  1  2 

That  is,  the  ratio  t/p  is  a  measure  of  the  (un) worthiness 
of  an  alternative.  It  gains  intuitive  meaning  from 
realizing  that  it  is  the  expected  cost  to  success  of  a 
series  of  alternatives  of  (t,p)  pursued  sequentially.  For 
example,  suppose  we  have  an  infinite  set  of  boxes,  each 
having  probability  p  of  containing  a  ball.  If  it  costs  t 
to  search  each  box,  the  expected  cost  of  finding  a  ball 
will  be  t/p.  This  use  of  t/p  as  a  measure  of  worth  has 
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been  derived  by  others  in  this  sequential  search  context, 
most  recently  by  Simon  and  Kadane  [11]. 

No w  we  can  consider  one  level  look-ahead  choice.  Re 
consider  the  cost  of  choosing  an  alternative*  say  A,,  and 
investing  the  effort  s1  to  revise  our  estimates  of  p  and 
t ,  to  p,]  and  t^  according  to  distribution  f  1  (t^  ,p*  )  . 
Then  we  will  use  the  simple  criterion 

t  *  t 

_ 1_  <  _ _ 2_ 

P*  P 

1  2 

to  choose  whether  to  continue  with  A 1  or  switch  to  A2.  Let 
U  be  the  event  that  A 1  is  so  chosen;  this  will  correspond 
to  the  shaded  region  (Figure  8)  of  the  (t*,p*)  plane 
bounded  by  the  line  (t*  /p*  )  =  lb2/p2) .  Then  the  expected 
cost  E,  of  choosing  A1  at  the  outset  is 

E  =  s  +  P[  U]  {S[  t 5  |  UJ  +  (1-2f  p*  |U  ])  t  ] 

11  1  12 

+  (1-P[0]){t  +(1-p  )  E[  t *  1  not  U  ]} 

2  2  1 

Similarly*  if  we  let  V  be  the  event  that  (t  *2 /p  1  )  < 

(t1/p]  )  * 

E  =  s  +P[  VJ  (E[  t3  |  V]+(1-E[  p*  3  ?  ]}  t  } 

2  2  2  2  1 

+  (1-P[Vj){t  +(1-p  )  E[  t  *  1  not  YJ] 

1  1  2 

Our  decision  criterion  is  now  to  chose  A 1  if  E,  <  E2. 


. 


This  defines  a  binary  comparison  of  ORed 

alternatives.  Although  this  comparison  is  not  obviously 
transitive  (exceptions  are  difficult  to  imagine),  it  can 
be  extended  to  chose  one  of  n  alternatives,  as  follows. 

Consider  a  set  of  n  alternatives  (A  s :  1<i<n} . 

Numerically  ranking  the  values  Itj/Pj)  will  produce  a 
permutation  r  of  the  integers  1  to  n,  such  that 

t  t 

_£jil  S  _rli±H 

P  P 

r  (i)  r  (i+1) 

Then  we  can  define  probability  of  success 

p  =  i-  TT  o-p  ) 

1 <j<n  j 
and  expected  cost 

e  =  5  t  TT  i  i-p  ) 

1  < k< n  r  (k)  1<j<k  r&j) 

of  trying  the  alternatives  in  the  order  defined  by  r.  To 
develop  a  one  level  look-ahead  choice,  we  consider  the 
cost  Ej  of  starting  with  Aj  and  then  either  completing  A  -t 
or  switching  to  try  the  remaining  alternatives  in  the 
order  defined  by  r.  Define  g;  to  be  the  probability  or 
failure  of  all  alternatives  except  A; : 


. 
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q  =  TT  (i-p  ) 

i  1  <  j  <  n  j 

and  Cj  to  be  the  expected  cost  of  trying  the  alternatives 
except  A ; : 

c  =5  t  TT  ti-p  ) 

i  1<k<n  rik)  1<j<k  rij) 

r  (k)  t±  r  ( j)  *i 

Then  let  I  be  the  event  that  it9,  /p1.)  <  (C  .  /  il-q  ;  )  )  ,  that 
is,  that  we  choose  to  continue  with  A , .  So 

E  =  s  +  P[  I]  (E£  t*  1 1  ]+  (1  -E(  pJ  j  I  J)  C  } 
i  i  i  i  i 

+  11-PCIJ)  {C  +  q  E[  t 9  ]  not  I]} 
i  i  i 

Because  of  shared  terms  among  the  C i;  the  complexity  of 
this  computation  increases  only  linearly  with  n. 

The  question  now  arises:  if  costs  of  aitervatives  are 
interdependent  and  so  can't  be  used  alone,  why  not 
characterize  alternatives  by  only  their  unworthiness  t/p? 
This  is  clearly  independent  of  the  alternative,  yet  would 
allow  simple  and  look-ahead  decisions  analogous  to  those 
considered  above,  with  only  one  parameter  instead  of  two. 
The  answer  lies  in  the  other  uses  of  characterizations 
pother  than  choosing  among  ORed  alternatives) .  For 
example,  it  is  clear  how  to  combine  the  (t,p) 
characterizations  of  conjuncts  to  get  a  characterization 


. 


' 


. 
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of  the  conjunction,,  but  this  is  not  so  with  t/p. 

3^4  Practical  representation 

A  problem  of  practical  importance  now  arises.  What 
sort  of  a  computationally  manageable  approximation  of  the 
bivariate  distribution  function  f  (t * , p ' }  is  best?  Four 
categories  of  representation  were  considered. 

3.4.1  Parameterized  distribution 

This  category  was  dismissed  because  of  the  difficulty 
of  evaluating  the  E[  t  *  j  (t  8/P  3 )  < w  ]  and  E[  p  1 1  (t  * /p  * }  <w  ] 
terms  of  the  decision  criterion.  For  example,  although 
polynomial  approximations  are  available  for  the  bivariate 
normal,  the  shape  of  the  regions  lt*/p,)<w  is  intractable. 

3.4.2  Discrete  points 

This  type  of  representation  approximates  f(t*,Ps;  by 
a  number  of  points  (t,,pt)  each  with  probability  q-»  For 
example,  a  uniform  distribution  on  t<1  might  be 
approximated  as  in  Figure  9. 

This  was  rejected  as  being  too  pessimistic  in  cases 
of  small  t/p;  in  the  example,  the  approximation  indicates 
no  possibility  of  t/p  going  lower  than  1/2.  It  is  often, 
of  course,  just  such  "remote"  possibilities  that  we  are 


interested  in. 
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FIG.  10 


STEP  FUNCTION  REPRESENTATION 
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3,4.3  Rectangular  prisms 


5  1 


This  representation  would  approximate  the  value  of 
f{t’,p*)  as  a  constant  in  a  region  bounded  inside  and  out 
by  rectangles,  as  in  Figure  10.  Thus  we  have  a  two 
dimensional  step  function;  two  to  five  steps  would 
probably  be  needed.  The  problem  here  too  lies  in  computing 
E[t*3U]  and  E[p9|U]  terms:  the  number  of  cases  resulting 
from  dividing  a  rectangle  with  a  line  is  excessive. 

3.4.4  Set  of  lines  and  points 

The  representation  finally  adopted  is  of  this  type. 
All  probability  is  condensed  onto  a  set  of  lines  and 
points.  The  one  dimensional  distributions  along  the  lines 
are  represented  as  a  piecewise  linear  function  with  up  to 
five  pieces.  This  type  of  representation  should  be  less 
crude  than  either  the  discrete  only  or  step  functions 
considered  above,  as  well  as  being  computationally  simpler 
than  a  step  function. 

Several  sorts  of  line  condensations  were  considered. 
A  number  of  variations  had  lines  radiating  from  some 
central  point,  usually  the  mean  (Figure  11) . 

A  common  problem  of  radial  lines  is  that  different 
points  will  represent  the  condensation  of  different  areas. 
For  example,  the  repr esentation  of  a  uniform  distribution 
on  t  *  <1  might  be  as  in  Figure  12, 


where  the  distribution 
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FIG.  11  POSSIBLE  RADIAL  REPRESENTATIONS 
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along  line  A  is  defined  by  condensing  into  points  X  on  the 
line  the  probability  between  the  diagonals.  The  source  of 
the  difficulty  is  computing  probabilities  for  intervals  on 
these  lines:  the  distribution  function  must  be  scaled  by 
the  shrinkage  of  area;  the  result  is  in  general  (for 
reasonable  shrinkage  functions)  quadratic.  Lines  should  be 
parallel  to  avoid  this  problem. 

The  question  of  slope  of  the  lines  remains.  Should 
they  be  parallel  to  the  t  axis,  the  p  axis,  or  somewhere 
between?  The  preferred  representation  has  them  parallel  to 
the  t  axis;  this  choice  was  motivated  by  the  non-linear 
distortion  of  lines  of  any  other  slope  on  combining 
representations  at  an  OH  node  using  a  combining  method 
later  judged  nonpref errable. 

However,  I  believe  the  choice  is  still  sensible. 
Lines  not  parallel  to  one  of  the  axes  are  unappealing  and 
of  no  apparent  advantage.  And  it  would  seem  desirable  to 
have  more  detail  about  variations  in  cost  than 
probability,  rather  than  vice  versa. 

So,  the  representation  proposed  is  to  condense 
probability  onto  a  set  of  five  or  less  lines  parallel  to 
the  t  axis,  and  into  a  similar  number  of  points  (if 
desirable) .  The  selection  of  numbers  and  locations  of  the 
lines  and  points,  and  condensation  rules,  are  left  to  ad 


hoc  decision. 
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3^5  Backing  up  characterizations 

The  control  structure  I  aa  proposing  is  like  that  of 
MULTIPLE.  This  requires  the  ability  to  ’’back  up” 
characterizations  over  OR  and  AND  nodes,  so  that  the 
character izations  of  higher  nodes  may  be  refined,  and 
possibly  choices  altered,  as  elaboration  proceeds. 

3.5.1  OR  nodes 

Let  us  treat  the  OR  node  first.  Here  we  have  two 
alternatives  A ]  and  A2  (n-ary  ORs  can  be  strung  out  as  a 
sequence  of  binaries! ,  one  of  which  will  be  the  “choice” 
by  the  proposed  decision  criterion,  say  A 1 . 

The  simplest  possibility  is  for  the  OS  to  adopt  the 
characterization  of  the  choice  (A,).  However,  this  says 
nothing  of  the  advantage  that  Is  gained  by  having  a 
non-failure  alternative:  the  characterization  is  the  same 
if  A 2  is  only  slightly  less  good  or  if  A2  is  total 
failure.  The  error  in  the  backed  up  probability  of  success 
is  especially  glaring. 

A  way  of  combining  the  characterizations  is  suggested 
by  the  formula  for  expected  cost  of  starting  with  A,: 

S  =  s  +P[U]  {E[t*  !U]+(1-E[p»  |UJ)  t  } 

11  1  12 

+  P[  not  UJ{t  +i1-p  }  E[  t *  1  not  U  ]} 

2  2  1 


■ 
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Now  consider  t*  (t  *,  ,p*2),  the  expected  remaining  cost  of 
the  OR  given  that  the  initial  step  of  A,  has  been  taken 
(at  a  cost  of  s1 )  ,  and  we  have  values  for  t 1 1  and  p,]  .  If 
(t  • ,  /p  ■ ,  )  <  (t2/p2)  (event  U;  see  Figure  8  above),  we 
continue  with  A 1  and  t9  =  t\  +i1-p,1)t2.  If  not,  we  switch 
to  A  2  and  t9  =  1 2*  (1-p2)  t 9 2  •  ‘That  is. 


t®  +  (i-pi)t  if 
1  1  2 


t • (t 9 ,p9 ) 
1  1 


=  < 


t  +  ( 1  -  p  )  t 9 
w  2  2  1 


if 


Similarly,  the  probability  of  the  OR: 


P9  (t«  ,p' ) 
1  1 


1-  (1  -p 9  5  (1-p  ) 
1  2 


These  functions  will  map  the  f ,  (t91 
a  new  distribution  for  the  OR.  The 
the  OR  would  be  s 1 ;  t  and  p  of 
expected  values  of  the  transformed 


,p,1)  distribution 
initial  stage  cost 
the  OR  would  be 
distribution: 


into 
s  of 
the 


t  =  s  +  E[t«  J 
1 

P  =  E[p«  J 


To  evaluate  this  method  of 
to  consider  an  example.  Suppose 
uniform  on  ts,  <  1  (and  zero  for 
vanishingly  small,  and  let  t2 


combination,  it  is  useful 
for  A,  that  f  ,  (t  ,  p 9 ,  )  is 
t9,  >  1).  Let  s,  be 


=  1/2,  p.  =  1/2.  Then 


« 
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1/2  and  p  ,  =  1/2.  Suppose  we  condense  the  uniform  f ,  onto 
five  lines  at  p  =  .  1 , .  3„  .  5, .  7,  and  .9  (see  Figure  13a). 
The  transformed  f(t*,p*)  is  in  Figure  13b.  While  the 
expected  probability  of  success  rises  from  .5  to  .75,  the 
expected  cost  also  rises  from  .5  to  .6675,  so  the 
unworthiness  t/p  falls  from  1  to  .89. 

However,  this  improvement  in  gross  parameters  is 
achieved  at  the  expense  of  the  possibility  of  large 
improvement.  While  with  A ,  the  cost  and  unworthiness  could 
fail  to  zero,  the  OR  node  indicates  that  these  will  not 
fall  below  .05  and  .0556  respectively.  Thus  while  our 
decision  criterion  say  well  prefer  A,  to  a  potential  A3 
with  t/p  <  .0556,  \K }  OR  A2)  will  never  be  preferred.  This 
does  not  seem  acceptable.  Why  should  supplying  a 
non-failure  alternative  make  A1  less  desirable? 

The  answer  lies  in  the  status  of  failure  as  a 
non-alternative  (see  Section  3.3.1):  the  characterization 
of  A 1  is  not  what  one  would  get  by  combining  (A 1  OR 
Failure) .  Indeed,  the  difficulty  with  characterizing  such 
a  combination  is  the  reason  for  making  failure  an 
exception. 

We  see  the  problem  at  another  level.  The 
transformation  proposed  for  iA1  OR  A2)  was  derived  on  the 
assumption  that  the  alternative  to  (A,  OR  A2)  was  failure; 
this  will  not  usually  be  the  case:  the  weakness  of  this 
transformation  appears  when  we  consider  a  rhird 
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density  on  lines 
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FIG.  13a  PARALLEL  REPRESENTATION  OF  UNIFORM 
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non-failure  alternative 


What  can  we  do?  Rather  than  go  back  to  simply  using 
the  characterization  of  the  choice  alternative,  I  suggest 
the  following.  Divide  the  f 1 (t 1  ,  p1,)  distribution  along 
the  line  =  (t2/p2 )  .  Let  f  (t 9  ,  ps }  =  f,^',  fp',) 

in  the  region  1 1 a  1  /p*  }  <  (t2/p2)  ,  and  condense  the 

probability  in  the  region  (t^/p*,)  >  (t2/p2)  into  a 

single  point  at  (t2,p2).  The  illustrative  example  would 
then  look  like  Figure  14.  This  allows  improved 
alternatives  to  be  reflected  in  the  characterization, 
without  sacrificing  the  desirability  of  the  disjunction  in 
further  comparisons. 


3.5.2  AND  nodes 

Let  us  now  consider  backing  up  over  an  AND  node  (G  , 
AND  G 2  AND  ...  AND  Gn),  where  the  conjoined  subgoals  are 
characterized  like  the  alternatives  dicussed  above,  that 
is,  Gj  will  be  characterized  by  initial  cost  sir 
probability  p ; ,  cost  tir  and  a  distribution  of  revised 
estimates  fj  (t'j  jp'j  }.  The  AND  will  be  characterized  be  s, 
t,  p,  and  f (t  * , p8 ) . 

Assuming  independence  of  the  subgoals,  it  is  clear 

that 

P  =  TT  P  and  t  =  2  t  . 

1<i<n  i  1<i<n 


i 


tfrflbilfc  /to x  l  (  ,*<J*  ,4  ^  %  *  -w 
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If  we  assume  the  subgcals  are  pursued  in  parallel  (or 
rather,  breadth-first) ,  then  we  would  have 

s  =  s 

1<i<n  i 

and  f(t*,p,|  would  be  the  distribution  of  the  sum  and 
product  random  variables 

P,=  TT  p ’  and  t9=  >  t* 

1<i<n  i  1<i<n  i 

This  is  computationally  difficult,  involving  the 
convolution  of  distributions.  Also,  pursuing  the  subgoals 
in  parallel  is  a  poor  strategy;  we  should  do  first  what 
will  affect  our  (t,p)  estimates  most  drastically,  that  is, 
'•schedule”  first  (for  planning,  not  execution)  that 
subgoal  which  effects  the  most  rapid  change  in  the 
character  of  the  problem. 

Consider  the  problem  of  scheduling  the  planning  of 
two  conjoined  subgoals  G  ,  and  G2.  The  argument  is  made  in 
relation  to  an  alternative  A4: 


A  _ _ _ |  OR  \ 

3  I  i _ 1 

_ i  ~aI  d~  j _ 

S  * _ i  ! 

i  i 

G  G 

1  2 


I 

I 

A 

4 


We  will  consider  the  expected  cost  of  i (G ,  THEN  G2)  OR  A4) 


. 


/ 

■ 
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against  ((G2  THEN  G , j  OH  A4).  Suppose  investing  s,  in  G, 
does  not  affect  our  estimates  of  p,  and  t , ,  that  is,  p 9 ,  = 

p,  and  t* ,  =  t,-s,.  Then  if  we  choose  A3  initially,  we 

will  still  choose  A3  after  sM  and  (presumably)  invest  s2 
in  G  2 »  Suppose  after  s2  that  (t  *2  =0  ,p  *2  =0)  and 
(t  *2  =1.  Qf  p*2  =1.  0)  each  occur  with  probability  1/2.  Then 
after  investing  s2  we  can  choose  between  A3  and  A4.  Now, 
the  expected  cost  E21  of  ( (G  2  THEN  G,)  OR  A4)  is 

E  =  s  +  •  5t  +  .  5t 
21  2  1  4 


and  similarly 


E  =  s  *s  + .  5t 9  +  .  5t 
12  12  1  4 

=  s  +s  +.5t  -.5s  +.5t 

12  1  14 


=5  +.5s 

21  1 

that  is,  when  doing  (G,  THEN  G2),  half  the  time  planning 
step  s,  is  wasted. 


This  motivates  our  intuition  that  when  there  is  an 
alternative  to  a  goal,  we  should  pursue  the  goal  in  such  a 
way  that  information  causing  us  to  switch  to  the 
alternative  should  be  utilized  as  soon  as  possible. 
Further  analytical  justification  is  in  the  comparison  of 
alternatives  with  normal  fit*)  (Section  3.6);  the  larger 
the  standard  deviation  of  t9 ,  the  more  preferred  the 


alternative  to  a  uniform  one. 
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How  can  we  apply  this  to  scheduling  conjoined 
subgoals?  I  propose  scheduling  first  that  subgoal  which 
maximizes  the  expected  rate  of  change  of  the  t/p  ratio  of 
the  AND. 

Let  us  construct  a  measure  of  this  rate  of  change  for 
{G 1  THEN  G 2 ) .  Define  event  0  that  t/p  decreases  after 
investing  s,  in  G,,  and  event  V  =  not  0.  Then  a  reasonable 
measure  of  change  of  t/p  will  be  the  expected  absolute 
value  of  the  change: 

P[U]*B  -  33  Hj  +  P[VJiBJV  -  B  ) 

0  0 

t  +t  E[  t*  j  0  ]+t 

where  B  = _ 1_ _ 2_  ,  BjU  = _ _1 _ _ 2_ 

0  p  p  2[  p *  1 0  Jp 

12  12 

E[  t 1  3?  J  +  t 

and  BjV  =  _ _ _ j. _ _ 2 

EfpMOJp 
1  2 

This  is  computationally  easier  if  events  0  and  V  are 
independent  of  G  .  Define  event  U,  that  ( t *,  /p • ,  )  < 
/p  )  ,  and  V  similarly;  then  the  change  measure  will  be 

C  =  pro  ](B  -  B  3  0  )  +  PCV  ]  (B 1  V  -  B  ) 

1  10  1  110 

Clearly,  the  Eft*,  JO,]  etc.  terms  can  be  calculated  for  G, 
indeo en dently  of  G2/  so  evaluating  C ,  is  a  simple  matter 


of  ra ult iplication  and  addition 
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Bate  of  change  of  unworthiness  then  would  be  C^s,; 
this  can  be  defined  similarly  for  any  one  of  n  subgoais. 
They  would  be  scheduled  for  planning  then  by  ordering  the 
C i /s  j  ratios. 

Let  us  return  to  the  question  of  backing  up  the 
character ization  of  subgoals  to  an  AND  node,  given  the 
(planning)  schedule  (G  ,  THEN  G2  THEN  THEN  Gn)  as 
defined  above.  We  can  then  let  the  initial  step  s  of  the 
AND  be  s,  ,  that  is,  the  first  step  of  G,  is  considered  to 
be  the  first  step  of  the  conjunction.  Then 


t  =T 


I  <i<n 


P  =  H  P 

1 <i<n  i 


and  f(t9,,p*}  be  f  (t*,p*)  translated  by 


T 


2<i<n  i 


and  scaled  by 


p  =  TT 


<i<n  i 


For  example,  if  n— 3  and  all  subgoals  are 
characterized  by  the  uniform  distribution  considered 
before  (Figure  15),  then  fit4,?*)  will  be  as  in  Figure  15. 

Tj^ts  method  of  combination  can  ue  cratized  on  >_he 
same  grounds  as  the  first  proposed  method  for  OR  nodes 
(Section  3.5,1);  possibility  of  small  values  of  t  (and 
large  p)  is  unduly  pessimistic  (ie,  Cj  *  However,  no  easy 
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alternative  is  available. 

Also,  using  the  elaboration  of  a  single  subgoal  to 
represent  the  ” next  step1*  of  a  conjunct  may  not  be  the 
best  way  of  splitting  the  conjunct  into  two  steps.  That 
is,  the  first  step  of  a  single  subgoal  may  have  little 
overall  effect  on  a  conjunction  of  perhaps  20  similar 
subgoals  --  it  would  be  more  natural  to  consider  the  first 
step  of  the  conjunction  to  be  the  combination  of  first 
steps  of  all  the  subgoals  in  this  case.  Such  refinement 
however  seems  too  remote  and  difficult  to  deal  with  here. 

The  concept  of  using  expected  rate  of  change  of  the 
t/p  ratio  to  schedule  subgoals  can  be  applied  to  OR  nodes 
as  well.  However,  in  the  disjunctive  case,  the  expected 
cost  of  the  OS  is  dependendent  upon  the  order  in  which  the 
disjuncts  are  tried,  a  complication  not  encountered  with 
conjuncts.  While  not  an  impossible  complication,  there  is 
no  indication  that  this  method  will  help  achieve  our 
ultimate  goal,  lowest  expected  cost.  Lowest  expected  cost 
leads  us  to  rate  of  change  of  t/p  for  conjuncts,  but  leads 
directly  to  the  choice  method  of  Section  3,5.1  above  for 
disjuncts. 

3 . 6  Approximating  a  multi-step  process  with  a  two  step  one 

Suppose  we  have  a  multi-step  process,  after  each  step 
of  which  we  acquire  more  reliable  information  on  remaining 
cost.  Where  should  we  break  such  a  process  into  only  two 
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stages,  for  consideration  by  our  one  level  look  ahead 
decision  criterion? 


Consider  two  drills  D,  and  D 
pool  of  oil  (Figure  16).  The  oil 
B2  below  D2 ;  define  difference  d 
investment  of  unit  effort  deepens 
unit  effort  deepens  the  hole  X 
random  variable  with  mean  y  and 
define  r  =  6x/y.  Each  process  can 
in  infinitely  many  ways,  charact 
units  invested  in  the  first  stage 


drilled  Z 

will 

be  sy,  with  a 

V  (sO2)  = 

ry  \IT. 

Thus  second  st 

and  E[ t 1 7  ] 

=  (3  2  ” 

sy)/y;  tf2  has  v 

2  attempting  to  re 
is  depth  B,  below 


=  B 


B2  .  Fo 


the  hole  y  feet.  F 
feet,  where  X  is  a 
standard  deviatio 
be  split  into  two 
erized  by  the  numbe 
.  For  D 2 ,  expected 
variance  of  sO2,  o 
age  cost  t*2  =  (B2 

ariance  se2/y2. 


ach  a 
D ,  and 
r  D ,  , 
or  D2  r 
normal 
n  Qx; 
stages 
r  s  of 
depth 
r  0  z  = 
-z)/yr 


The  key  observation  is  that  for  a  uniform  process  of 
this  type,  the  variance  of  the  second  stage  cost  is 
proportional  to  the  cost  cf  the  first  stage. 


What  size  of  first  step  will  represent  the  use  of  two 
stages  to  maximum  advantage?  If  the  first  step  is  too 
small,  we  gain  little  information  about  the  possible 
improvement  in  overall  cost,  since  the  variance  is  also 
small.  If  the  first  step  is  too  large,  we  already  have 
such  a  large  investment  in  the  process  that  we  lose  the 
advantage  of  having  an  alternative  available.  It  would 
seem  there  should  be  an  optimum  between  the  two  extremes. 
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FIG.  16  OIL  WELL  EXAMPLE 


FIG.  17  TWO 


STAGE  VERSUS  ONE  STAGE 
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Let  us  consider  this  problem  in  more  general  terms. 
Suppose  we  have  a  two  stage  process  where  the  remaining 
cost  X  after  a  first  stage  of  s  is  a  normal  random 
variable  with  mean  t®  and  standard  deviation  0,  and  we 
wish  to  compare  if  with  a  second  process  of  fixed  cost  u 
(one  stage  process;  Figure  17) .  The  overall  expected  cost 
of  the  two  step  process  will  be  t  =  s+t® .  Define  the  cost 
difference  d  =  t-u.  The  cost  of  beginning  with  the  two 
stage  process  and  then  reconsidering  will  be 

C  =  s  +  P[ X<u JE[X j X<u J  +  P [ X > u  J u 

Considering  this  as  a  function  of  u,  let  us  seek  solutions 
to  the  equation  C=u„  At  this  point,  the  expected  cost  of 
beginning  with  either  process  is  the  same,  although  the 
two  stage  process  has  independent  expected  cost  u+d.  That 
is,  d  can  be  considered  as  a  measure  of  worth  for  the  two 
stage  process;  it  represents  the  advantage  of  the  two 
stage  over  the  one  stage  process. 

Numerically  derived  solutions  to  C=u  are  tabulated  in 
Table  1  and  plotted  in  Figure  18,  in  terms  of  d,  for 
different  values  of  first  stage  cost  s  (both  normalized  in 
units  of  0).  That  is,  the  first  table  entry  means  that  for 
first  step  cost  s  equal  C,  the  solution  to  C=u  is  for  u  = 
t-d  with  d  equal  0.10. 

What  is  the  implication  for  a  uniform  process  like 
the  drilling  of  two  wells,  where  02  is  proportional  to  s? 
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TA3LS  1 

SOLUTIONS 

H3 

O 

O 

II 

P 

S 

d 

s  d 

cr 

cr 

cr  a 

1,0 

0.100 

0.  100 

0,5 

0.312 

0.  156 

0 . 4 

0.398 

0. 159 

0.3 

0.518 

0.155 

0.2 

0.697 

0 .  139 

0.  1 

1  ,00 

0.  100 

0.03 

1.10 

0.088 

0.06 

1,23 

0,074 

0.04 

1.41 

0.056 

0.0  2 

1 .68 

0.034 

0.01 

1  .92 

0.019 
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cr 


FIG.  18 


PROPERTIES  OF  SOLUTIONS  TO  C  =  U 


7  1 


To  represent  the  process  to  maximum  advantage*.  we  should 
chose  the  first  step  cost  s  giving  maximum  d.  In  this 
uniform  case,  for  some  constant  b,  02  =  bs  so  o  =  bs/O, 
and  d  =  <3  (d/0)  =  b  (s/C)  (d/O)  .  So,  from  the  plot  of 
(s/0)  (d/0)  we  can  see  that  d  is  a  maximum  for  s  =  . 40  =  d. 
That  is,  the  attractiveness  of  a  multi-stage  uniform 
process  over  a  single  stage  one  is  best  approximated  by  a 
two  stage  process  with  the  first  stage  cost  equal  4/10  of 
the  standard  deviation  of  second  stage  cost.  That  is,  the 
first  stage  cost  is  chosen  so  the  standard  deviation  of 
second  stage  cost  will  be  2.5  times  as  great. 

tfhat  about  the  more  usual  cases  where  the  process  is 
not  uniform?  I  have  argued  that  it  is  reasonable  to 
structure  planning  processes  hierarchically,  where  the 
most  drastic  refinements  to  remaining  cost  estimates  are 
done  first.  Then  variance  will  be  less  than  a  linear 
function  of  effort  invested,  that  is  e2<bs,  and  if  we 
define  g  (s)  =  02/s,  then  g  Is  a  decreasing  function  of  s, 
and  so  d  =  a  (d/O)  =  gts)  (s/C)  (d/C).  Then  the  maximum  d 
will  correspond  to  smaller  s  than  the  maximum  for  g(s)  a 
constant.  That  is,  with  hierarchical  planning,  we  choose 
first  stage  cost  so  that  the  standard  deviation  of 
remaining  cost  is  greater  than  2.5  times  first  stage  cost 
(as  is  optimal  for  the  uniform  case) . 
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3 . 7  Hon "termination 

It  is  possible  that  a  planning  process  may  not 
terminate*  that  is,  it  may  run  on  forever  without  either 
succeeding  or  failing.  The  expected  cost  of  the  process 
may  then  be  infinite.  Moreover,  this  may  be  true  of  a 
desirable  process,  for  example,  one  which  succeeds  after 
unit  effort  with  probability  0.99,  but  runs  on  forever 
otherwise.  If  we  terminate  this  process  after  one  unit,  we 
convert  it  to  one  which  succeeds  with  probability  0.99  and 
fails  otherwise.  Thus  by  establishing  an  upper  cost  limit, 
and  regarding  cases  exceeding  this  as  failure,  we  can 
solve  the  problem  of  non- ter mination . 

How  should  this  upper  limit  be  established?  It  is 
introduced  to  make  desirable  processes  appear  desirable; 


it  would  seem  logical 

to  fix  it  so  as 

to 

make  the 

P 

rocess 

appear  as 

desirable 

as  possible. 

We 

shall 

der 

ive  a 

condition 

which  fflimlmizes  the  (un) 

wor 

thiness 

of 

the 

process,  t/p. 

Suppose  a  process  has  probability  of  success  p, 
failure  g,  and  non- termination  r.  Let  the  cost 
distribution  be  described  by  density  function  f(x).  This 
function  will  be  ’’improper1*  in  the  sense  that  it 
incorporates  a  Dirac  delta  of  magnitude  r  at  infinity.  So, 
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but 


j-OO 

f (x) dx 

Jo 

Now, 

1st  pT  b 

with 

cost  less 

that 

the  cost 

fail 

ure.  Then 

=  1 

e  the  probability 
than  T;  similarly 
distribution  is 


that  the  process  succeeds 
define  qT.  Now,  ass urn e 
independent  of  success  or 


p  =  pP[  x<T  ]  =  pF^T) 

T 


where 


rT 


FIT)  = 


J  0 


f  (x)  dx 


Now,  let  us  choose  T  to  maximize  pT/tT,  where  tT  is  the 
expected  cost,  given  that  the  cost  is  less  than  T: 


t 

T 


xf  (x)  dx 
0 _ 

F  <T) 


E[ t j  t<T  ] 


To  maximize  pT/tT,  we  solve  (d/dT)  ipT/tT)  =  0 

T 

_d  f  p  1  =  p_d_f  f  (x)  dx  =  pf  (T) 
d?  L  TJ  dT  J 0 


F  (T) 

f  1  =  _  dT 

L  tJ 


xf  (x)  ax 
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F  (T)  Tf  (T)  -  FIT)  t  f  (T) 

 T 


[F{T) 

Tf  iTJ  -  t  f(T) 
T 


F  (T) 


Now 


d 

rp  -1 

m 

Q 

t  - 

=  T  dT 

-p  - 

T  dT 

t 

.  V1. 

dT 

t 

rp 

-i.  w 

t* 

T 

t  pf(T)  -  p£ Tf  (T)  -  t  f  (T)  ] 
=  _T _ T _ 

t2 

T 


2llll  [ 2t  -Tj 
-  T 

T 

This  is  zero  iff  T  =  2tT.  Now  as  T — ^oo,  tT  remains 
finite,  so  there  will  be  some  X  such  that  T  >  2tT  for  all 
T  >  X,  so  that  PT/tT  decreases  for  T  >  X.  Also,  if  there 
are  values  Y  which  satisfy  Y  =  2tY,  then  the  largest  value 
must  be  a  maximum  (or  inflection  point) . 

The  behavior  of  pT/tT  at  T  =  2tT  can  be  determined  by 
evaluating  (d/dT)  { 2tT - T )  since  [pf{T)J/t*T  >  C. 
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_fL_  (2t  -T)  =  2f  (Tj_  (T-t  )  -1 
dT  T  F{T)  T 

At  t  =  .  5T  , 

=  TflT)  -1 
F  (?) 

This  is  less  than  zero,  ie.  pT/tT  is  a  maximum,  if  Tf (T)  < 

F (T)  ,  that  is,  if  f  (T)  is  less  than  the  uniform  density 
function  having  the  same  F  (T)  . 

The  same  threshold  can  be  applied  to  a  discontinuous 
distribution,  with  the  understanding  that  T  can  be  less 
than  2 1 T  if  there  is  zero  probability  that  I  <  x  <  2tT. 
That  is, 

T  =  min[  2t  ,  inf{t  :  F  (t)  =p  +  q}  ] 

T 

would  be  adequate. 

There  are  cases  where  no  sensible  solution  io  T  =  2tT 
exists.  For  example,  with  the  exponential  distribution 
f  (x)  =  ge”9X  ,  x>0,  the  only  solution  is  T=0  (In  fact,  T=0 

for  any  monotonic  decreasing  distribution  function) . 

However,  this  corresponds  to  our  intuition  in  this  case: 
the  threshold  should  be  "small”,  after  which  we 
reconsider,  retaining  the  option  of  continuing  with  the 
remaining  process  (which  will  still  be  exponential) . 

This  points  up  a  feature  of  the  cutoff  threshold 

deserving  comment.  We  introduce  it  only  to  make  a  process 
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appear  as  desirable  as  is  reasonable;  this  does  not  mean 
that  we  are  comitted  to  terminating  the  process  if  it 


exceeds  the 

threshold.  Ail 

we 

must  do 

to 

j  ustif y 

the 

appearance 

is  reconsider 

the 

process 

when 

we  reach 

the 

thres  hold . 

The  cutoff  T  =  2t T  was  derived  assuming  the  cost 
distribution  was  independent  of  success  or  failure®  With 
practical  planning  processes,  this  is  often  not  the  case. 
For  example,,  many  processes  fail  only  after  they  have 
tried  all  possibilities  for  success ,  that  is,  low  cost 
implies  success.  However,  this  is  the  same  as  the  case 
analyzed,  with  q=0.  So,  it  seems  the  result  is  stronger 
than  the  case  from  which  it  was  derived.  While  more 
complicated  cases  resist  easy  solution,  the  method  still 
should  serve  as  a  reasonable  approximation. 
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CHAPTER  4 


A  STRIPS  WORLD  APPLICATION 


a » 1  General  descr iption 

my  hand-simulated  STRIPS-type  robot  uses  the  world 
model  and  operators  described  in  the  original  STRIPS  paper 


[  5  ],  but 

operates  in  the  style 

of  LAWALY. 

Assertions  are 

stored 

in  predicate  infix 

form. 

for 

example  (Doorl 

Connects 

Rooml  RooaS)  has  the 

form 

(Argumentl  Predicate 

Argument!  Arguments) .  Operators  follow  STRIPS  closely,  and 
are  given  in  functional  notation,  with  a  Precondition 
statement,  a  Delete  list,  and  an  Add  list.  These 
specifications  are  described  in  detail  in  the  Appendix. 

Considerable  simplification  of  the  general  proposal 
of  Section  3  above  was  necessary  to  enable  a  manageable 
hand  simulation.  The  principle  simplification  is  to  use 
probability  estimates  uniformly  equal  1,0.  It  turns  out 
then  that  the  two  methods  of  backing  up  over  OR  nodes 
(Section  3.5.1)  are  identical.  Also,  the  technique 
developed  in  Section  3,5.2  for  scheduling  ANDed  subgoals 
was  not  implemented;  these  are  simply  scheduled  left  to 
right . 

Given  a  goal,  the  system  begins  with  the  left-most 
literal,  by  checking  its  truth  against  the  world  model.  If 
it  is  not  currently  true,  it  becomes  an  OR  node  with  all 
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operators  adding  the  literal's  predicate  as  the  disjuncts. 
To  each  operator  is  attached  its  precondition  as  a 
subgoal;  the  system  then  continues  by  attacking  the 
left-most  literal  of  the  precondition  in  the  manner  just 
described.  This  back  chaining  continues  recursively  until 
all  the  bottom  level  goals  are  true  in  the  world  model. 
Choices  along  the  way  are  made  by  the  decision  criterion 
of  Section  3  above.  This  includes  both  choices  at  OR  nodes 
and  the  instantiation  of  free  variables ,  which  are  treated 
as  the  OR  of  their  possible  instantiations.  Free  variables 
are  instantiated  left  to  right.  For  example,  the  goal  ($X 
Nextto  ST)  first  becomes 

($X  N  $ Y )  OR _ _{$X  N  $Y) 

I 

1 

l$X1  N  $T) 

The  alternative  is  implicitly  restricted  to  X  *  XI. 

A  stage  of  elaboration  consists  of  expanding  the  plan 
until  a  new  free  variable  is  instantiated. 

To  develop  the  cost  estimates  for  goals,  let  us  first 
consider  the  simple  goal  (I  Nextto  B1) .  Clearly,  the  cost 
of  achieving  this  will  be  a  function  of  the  number  of 
doors  on  the  route  from  the  location  of  the  robot  to  the 
box.  If  they  are  in  the  same  room,  the  cost  is  easily 
determined,  since  the  plan  is  known: 
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(I  N  B 1 )  OR _ G2  CB  1)  OB _ .AND 

_ _ _ I  _1 _ _ 

3  3  3 

IF)  c3  1  IN  $X)  (I  IN  R 1 } 

•  •  ® 

•  • 

true  (31  IN  R1)  (I  IN  HI) 

Now,  we  need  a  numerical  cost  function*  Suppose  we  assign 
unit  cost  to  each  of  the  following  planning  operations: 

1.  Performing  a  match  of  a  literal  against  the  world 
model.  Eg,  Testing  the  truth  of  (31  IN  $X)  to  get  (8 1 
IN  R 1 ) 

2,  Attaching  a  node  in  the  planning  tree. 

For  execution  cost,  assign  ”a!t  units  for  each  operator 
application.  These  are  admittedly  simple  cost  criteria, 
but  the  system  and  examples  described  do  not  depend  on 
this  simplicity;  it  is  only  a  matter  of  convenience.  It 
has  the  virtue  that  no  unbounded  operation  can  get  away 
free.  The  above  plan  then  will  cost  (3*a) ,  assuming  the 
goal  node  to  be  given  without  charge  to  the  planner. 

Consider  now  the  goal  cl  IN  R) .  Clearly  the  cost  will 
depend  on  the  room  the  robot  is  in,  and  the  target  room. 

We  will  use  the  notation  r  (X)  to  refer  to  the  room 

satisfying  (X  IN  r(X)).  Let  us  also  define  j(X,Y)  to  be 
the  minimum  number  of  doors  on  a  path  joining  X  and  Y,  We 
expect  the  cost  of  goal  (I  IN  R)  to  depend  mainly  on 

j (I , R)  ,  and  will  develop  a  cost  estimate  based  on  this 

assumption.  Note  that  thus  the  robot  effectively  has 
direct  access  to  the  information  about  the  routes  between 
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any  two  rooms,  so  we  should  not  be  surprised  that  it  can 
avoid  false  steps  in  choosing  a  route.  However,  this  is 
not  unreasonable  in  an  environment  familiar  to  the  robot; 
Siklossy  also  uses  a  similar  direct  access  of  routes  with 
LAWALY  [  10  J. 

If  j  (I ,  R)  =  0,  then  R=r (I)  and  (I  IN  R)  is  true  in 
the  initial  model.  The  goal  (I  IN  R)  then  has  cost  0  (Once 
it's  been  placed  in  the  planning  tree  and  evaluated). 

Consider  now  the  given  world  with  goal  (I  IN  R5)  ; 
since  r (I)  =  R1,  j(I,R5)  =  1.  Planning  proceeds: 

(I  IN  R 5 )  OP. _ GD($D  $R  R5)  OR _ AND 

_ l _ 

2  13  I 

i$D  C  SR  R5)  (?)  (I  IN  *S)  (I  N  *D) 

When  we  bind  variables,  we  back  up  the  substitutions  and 
add  an  alternative: 

(I  IN  R5)  OR _ (I  IN  R5) 

3 

I 

GD  {D 1  R 1  R5)  OR _ AND 

_ _ 1 _ _ _ - 

III  i 

(D 1  C  R 1  R 5 )  (?)  (I  IN  R 1 )  (I  N  D  1 ) 

Note  that  the  alternative  is  added  to  the  nearest  ancestor 
for  which  there  is  an  evaluation;  thus  the  GD  operator 
level  is  bypassed.  The  cost  is  7  for  the  new  nodes  added 
and  u  for  matching,  for  a  total  of  11. 


The  remaining  cost  via  Dl  is  now  only  the  cost  of  (I 
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N  D1)  plus  a,  since  the  others  are  already  true.  From  the 
Appendix,  (I  N  D1)  is  characterized  by 


t  =  16-3-a 

f  =  1 . 0 (2a) 

s  =  1 6 

So,  the  remaining  cost  via  D1  is  16  +  2a  and  zhe  GD  (D 1  HI 
H5)  branch  is  char acteriz ed  by 

t  =  1 6*2a 

f  =  1 . 0 (2a) 

s  =  16 

How  shall  we  characterize  (I  IN  R5) ?  Suppose  we  wish  to 
search  the  rooms  connected  to  R5  in  sequence  until  we  find 
a  particular  R  for  which  =  j(I,R5)-1  =  0,  The 

expected  number  of  searches  will  depend  on  the  probability 
of  success  on  each  search;  assume  this  is  0.5,  Then  we 
expect  2  searches,  after  which  the  remaining  cost  will  be 
exactly  the  16+2a  obtained  above.  We  have  also  shown  that 
each  search  costs  11.  The  expected  cost  for  (I  IN  R5)  then 
is  2*11  +  16+2a  =  3 3+  2a ,  and  the  goal  can  be  characterized 


t  =  38+2a 
s  =  11 


f 


5  (16  +  2a) 
5  (3 8  + 2a) 


since  if  a  search  fails,  the  expected  remaining  cost  is 
the  same. 

The  characterization  for  j  (I ,  R)  =  2  can  be  determined 
similarly,  as  can  charact eri zations  for  the  other  unary 
goals;  these  are  tabulated  in  the  Appendix.  We  note  that 
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this  cost  structure  results  in  the  search  behavior  we  had 
assumed . 

4 . 2  Example:  ($X  N  $Y) 

Backing  up  of  characterizations  and  re-evaluating 
choices  is  illustrated  by  the  goal  ($X  N  $Y)  ,  ‘'achieve  one 
box  next  to  another".  Let  us  trace  through  its  planning  in 
detail,  for  the  initial  world  of  Figure  19;  the  planning 
tree  is  shown  in  Figure  20, 

First,  the  planning  tree  is  initialized  to  just 

1 ($X  N  $Y) 

We  use  numbers  before  nodes  to  indicate  the  order  in  which 
they  are  generated.  This  goal  is  matched  against  the  world 
model;  the  match  fails,  so  planning  begins: 

1($X  N  $Y) OR - 3{$1  N  $Y) 


2(31  N  SI) 

Now  there  is  a  choice  between  expanding  2  and  3;  their 


acterizai 

:ions  are 

C2:  s 

=  2 

f  =  .2(1 43  +  6a)  ;  .8{152  +  6a) 

t 

=  ^153  +  6a) 

C3 :  s 

-  o 

—  JU 

f  =  .2(63  +  2a);  ,8(73  +  2a) 

t 

=  (7  3+ 2a) 

starting  with  each  are: 


Costs  of 


. 
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FIG.  19 


initial  WORLD  FOR  ($X  N  $Y) 
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c2  =  2+1.0(73  +  2a)  =  (75+2a) 

c3  =  2+ 1. 0[ . 2 (63+2a)  +. 8  (73  +  2a)  J  =  (73+2a) 

So  3  is  chosen.  The  characterization  of  node  1  is  updated 
to  the  backed  up  OR  (  =  C3) : 

Cl:  s  =  2  f  =  . 2(63  +  2 a)  ;  .8  (73+2a) 

t  =  (7  3 +  2  a) 

A  like  process  continues  until  8  nodes  have  been 
created,  that  is,  all  possible  boxes  have  been  considered 
for  X.  At  this  point,  C2  =  C4  =  C6  and  08  is  different: 

08:  s  —  2  f  =  .2  ( 98+  4a)  ;  .8{1Q8+4a) 

t  =  108+4a 

and  by  backing  up,  08  —  >  07  — >  05— >  03— >  Cl,  since  7 
is  chosen  over  6,  5  over  4,  and  3  over  2.  Our 

characterization  of  node  1  is  altered  for  the  first  time. 

Now  (B4  N  $Y)  is  expanded;  the  alternative  is  always 
chosen  so  that  after  13  nodes  have  been  created,  012  = 

C 1 3 : 

012,013:  s  =  64  f  =  .  5  ( 90+6a)  ;  .5(112+6a) 

t  =  (1 6  5  +  6a) 

and  C12— >  CIO— >  08,  so  our  characterization  of  node  8 

changes.  This  in  turn  causes  backing  up  to  proceed 
further;  now  08  — >  07  and  node  6  becomes  preferred  to  7, 
so  Co  —  >  05  =  04  =  03  =  02  =  01  .  That  is. 


. 


' 


' 


' 


;  .  •  J  ' 


■ 

■ 


. 

' 


■- 


87 


Cl:  s  =  2  f  =  .  2  (143  +  6a)  ;  .  8(153*  6a) 

t  =  053^6a) 

Between  identical  subgoals,  the  first  is  chosen*,  so  now 
t B 1  N  $ Y)  is  expanded. 

Since  the  instances  are  all  bad,  all  remaining 
possibilities  for  I  are  considered,  until  18  nodes  have 
been  created.  Then 

CIS:  s  =  64  f  =  .5i90  +  6a);  .5(112  +  6a) 

t  =  (165  + 6a) 

and  C 18  — >  Cl7-->  C15— >  C2,  since  C14  =  C16  =  C18. 

Now  node  3  is  chosen  over  node  2,  and  since  4  and  5 
are  equivalent,  4  is  expanded. 

When  22  nodes  have  been  created, 

C22:  s  =  2  f  =  ,2(l43  +  8a);  .8*153  +  6a) 

t  =  (154*6a) 

C21:  s  =  64  f  =  *5{6Q+6a);  ,5i90  +  6a ) 

t  =  (I43*6a) 

so  21  is  chosen,  and  the  characterization  backs  up  to  20, 
4,  3  and  1.  The  choice  path  is  still  to  node  21,  so  the 

fully  instantiated  {  at  this  level)  subgoai  {B2  N  B3)  is 
expanded . 

Elaboration  continues  in  straightforward  fashion 
until  after  93  nodes  have  been  created,  the  plan 


. 


- 


. 


* 

' 


. 


88 


Goto2  (Door  1) 

Gothr uDoor  (Door 1  from  Rooml  into  RoomS) 
Goto2  (Door3) 

Gothr uDoor (Door3  from  RoomS  into  Room3) 
Goto2  (Box2) 

Pushto (Box2  to  Box3) 


is  produced. 


This 

exam  pie 

shows  how 

my  proposal  can 

abandon 

choices 

as 

soon 

as  they  become  unattractive. 

without 

planning 

th 

em  to  co 

mpletion.  For 

example,  the  choice  to 

plan  (B4 

N  $  Y)  w 

as  abandoned 

before  a  complete 

plan  was 

formed. 

Th 

is  is 

in  contrast 

to  most  robot 

plannin g 

systems,  which  employ  only  failure-driven  back  up. 

While  this  in  itself  is  an  achievement,  it  is  no 
better  than  the  performance  of  MULTIPLE  or  other  simple 
evaluation  function  methods.  The  need  for  a  look-ahead 
feature  only  becomes  clear  in  the  next  example. 

fU3  Example:  (Q  ST  ON) 


Let  us 

now  look  at 

the  STRIPS 

version 

of  the 

monkey 

bananas 

problem : 

turning  on 

a  light. 

which  r 

eguires 

a  box 

to  reach  th 

e  switch.  W 

e  use  the 

initial 

wori  d 

model  of  Figure  21.  The  goal  elaborates  as  in  Figure  22. 

After  22  nodes  have  been  created,  node  22  {like  nodes 


1 ,  8  and  15)  is  characterized 
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FIG.  21 


INITIAL  WORLD  FOR  (Q  ST  ON) 
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C22:  s  =  11  f  =  .2(1 52+8a) ;  .2{174+8a); 

t  =  ( 1 90 . 5  +  8a )  .  6(190. 5  +  8a) 

and  node  16  will  be 


C16  s  =  64  f  =  .5(99+  8a)  ,  .5(121  +  8a) 

t  =  (174  +  8a) 


The  clear  choice  is  16,  and  this  will  back  up  to  node  1  so 
Cl  =  C8  =  C15  =  C16.  The  remainder  of  the  elaboration  is 
straightforward  until  after  107  nodes  have  been  created 
the  plan  emerges: 


Goto2  (Door  1) 

GothruDoor  (Door 1  from  Rooml  into  RoomS ) 

Push  (Box3  to  Dcor4) 

Pushthr uDoor (3ox3  through  Dooru  from  Room5  into  Room4) 
Push  {Box3  to  switch  Q) 

ClimbUp (on  Box3) 

Turnon  (switch  Q) 


There  are  two  important  observations  about  this  plan. 
First  is  that  the  optimal  plan  involving  Box4  is  not 
considered.  Once  Box3  has  been  found,  the  system  does  not 
"believe”  the  added  effort  to  search  for  a  better  box  to 
be  worthwhile.  In  this  case  the  "belief"  is  wrong: 
expanding  node  22  at  a  cost  of  11  will  reduce  remaining 
cost  by  (174  +  81)  -  ( 1 5 2  +  3 a )  =  22.  Of  course,  to  quote 
Feldman  f 

.  .  „  if  we  use  the  expected  utility  as  a 
measure  when  searching  for  good  plans,  we  do  not 
guarantee  good  outcomes,  only  good  strategies. 

A  more  subtle  observation,  however,  is  this:  the 
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search  continued  after  Box2  had  been  tried  (node  14)  ,  even 
though  the  expected  cost  with  Box2  is  less  than  the 
expected  cost  of  ” trv  another  box”^. 

(196+Sa)  <  „2(174+8a)  +  ,2(152*8a)  +  .2(196  +  8a) 

+  .  4  (2 86  + 1 2a)  =  (218. 8+9. 6a) 

This  illustrates  the  advantage  to  be  gained  from  one  level 
of  look-ahead  in  our  characterization  of  alternatives:  a 
smart  search  strategy  must  do  better  than  merely  picking 
at  random  (the  result  with  a  simple  characterization) . 


* 
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CHAPTER  5 


CONCLUSIONS 

This  thesis  addresses  the  problem  of  optimizing  the 
cost  of  a  plan,  including  both  the  planning  and  execution 
costs.  An  explicit  proposal  for  doing  this  has  been 
developed,  based  on  classical  statistical  decision  theory. 
The  result  Is  an  evaluation  function  approach  in  the  style 
of  MULTIPLE  or  A* ,  but  the  evaluation  is  in  terms  of 
changes  in  expected  cost  and  probability,  rather  than  just 
probability  or  just  expected  cost.  This  refinement  was 
introduced  after  simple  evaluation  functions  were  argued 
to  be  inadequate  (Section  3.2).  While  my  system  can  be 
considered  as  a  generalization  of  MULTIPLE  and  A*,  it 
employs  totally  different  backing  up  methods  from 
MULTIPLE;  thus  it  has  no  '’special  case”  which  reduces  it 
to  MULTIPLE.  It  does,  however,  reduce  to  an  A*  type 
heuristic  on  remaining  cost  if  probabilities  are  all  1.0 
and  the  cost  distributions  are  not  distributed  at  all,  but 
consist  only  of  a  single  point. 

When  one  attempts  to  include  the  cost  of  analysis  in 
a  decision  analysis,  one  seems  led  to  an  infinite  regress: 
what  is  the  cost  of  analyzing  the  cost  of  analysis?  In 
practice,  one  need  never  pose  this  question;  one  merely 
supplies  cost  estimates  for  processes  which  include  the 
cost  of  analysis.  Thus  the  analysis  necessary  to  choose 
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among 

processes 

ithe 

computations 

of  Section  3.3  -  3.5 

above 

)  is  included 

as 

an  "overhead" 

in  the  individual 

proce 

ss  es . 

This  avoids  the  question  of  how  these  overhead  costs 
Slight  be  determined,  or  of  how  they  are  finite  if  they 
hide  an  infinite  regress.  Firstly,  they  are  finite  because 
the  proposed  decision  criterion  limits  the  regress  to  one 
step.  Secondly,  they  can  be  determined  by  merely  computing 
the  expected  number  of  times  the  continued  pursuit  of  the 
process  will  be  reconsidered,  since  each  reconsideration 
involves  fixed,  finite  cost.  Even  more  directly,  such  cost 
estimates  could  be  acquired  automatically  by  recording 
observed  costs  of  a  process  for  various  situations.  The 
fact  that  performance,  and  therefore  cost,  is  dependent  on 
the  cost  estimates  themselves,  should  result  in  a  learning 
process  which  propagates  information  by  iterative 
refinement  of  cost  estimates.  This  hope  is  strengthened  by 
the  observation  that  the  dependency  seems  to  be  one  way; 
estimates  for  complicated  processes  or  situations  depend 
on  the  performance  with  simpler  ones.  Thus,  look-ahead 
cost  estimates  seem  much  easier  to  acquire  than  one  might 
think;  the  "infinite  regress"  seems  more  a  conceptual 
problem  than  a  real  one . 


The  decision  proposal 
weaknesses,  which  seem  a 


developed  here 
necessary  evil 
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First  is  the  problem  which  emerged  when  backing  up 
char acter izations  of  alternatives  at  an  OR  node  (Section 
3.5.1).  Using  the  logically  developed  method,  it  emerged 
that  (A  OR  3)  could  be  a  worse  alternative  than  A  alone,  a 
logical  contradiction.  The  solution  adopted  was  an  ad  hoc 
"patch  up,? ,  at  the  expense  of  logical  consistency.  This 
difficulty  is  rooted  in  the  assumption  that  there  is  no 
alternative  to  (A  OR  B)  ;  this  assumption  is  usually  not 
valid.  Removing  it,  however,  is  not  easy,  for  then  the 
evaluation  of  any  single  alternative  depends  on  every 
other  alternative  in  the  planning  tree.  This  is  what  we 
were  trying  to  avoid  in  the  first  place  with  only  one 
level  of  look-ahead.  Despite  much  deliberation,  there 
seems  to  be  no  easy  way  out. 

Second  is  the  handling  of  backing  up  at  AND  nodes. 
Here,  we  were  forced  to  devise  an  ordering  of  the  subgoals 
so  that  the  first  step  of  one  subgoal  could  become  the 
first  step  of  the  AND.  This  was  dictated  by  the 
representation  of  the  distributions;  if  distributions  were 
represented  by  their  moments,  the  computational  problem  of 
convolving  them  (Section  3.5.2)  would  disappear,  and  we 
could  let  the  first  step  of  the  AND  be  the  combined  first 
steps  of  the  subgoais.  This  would  be  more  natural  if  the 
subgoals  were  similarly  characterized,  ab>  we  expect  within 
one  level  of  hierarchical  planning. 


Thus  we  see  that  there  are  other  possibili  uieo  j-Ot 
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the  development  of  a  look-ahead  planner  than  those 
explored  here.  While  these  were  bypassed  at  the  time,  we 
should  now  (like  the  planner  we  are  trying  to  develop) 
reconsider  these  choices.  What  should  be  done  differently, 
if  this  work  were  to  be  done  again?  I  believe  that  logical 
consistency  is  a  greater  goal  than  some  of  the  choices 
made  would  reflect;  this  would  dictate  making  realistic 
assumptions  about  alternatives  available  when  computing 
costs.  A  method  should  be  sought  to  make  this 
computationally  feasible.  If  I  could  offer  a  more  concrete 
suggestion  as  to  how  this  could  be  achieved,  I  surely 
would.  However,  I  can  offer  the  following  speculations. 

Having  established  the  need  for  characterization  of  a 
process  by  both  cost  and  probability  in  Section  3.3.1,. I 
would  like  to  offer  some  hope  of  relaxing  this 
complication.  Probability  estimates  were  introduced  to 
avoid  infinite  expected  cost  when  failure  was  a 
possibility.  However,  as  we  then  saw  in  Section  3.3.3,  we 
used  these  cost  estimates  in  such  a  way  that  the 
infinities  cancelled  out.  Could  it  be  that,  following  the 
lead  of  quantum  mechanics,  we  could  simply  ignore  the 
troublesome  infinities?  We  could  then  treat  failure  merely 
as  the  expected  cost  of  continuation  becoming  infinite, 
and  characterize  processes  by  a  simple  one  dimensional 
probability  distribution. 


This  is  not  so  easy  as  it  sounds;  the  conceptual 
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hurdle  of  floating  infinities  is  high.  However,  the 

situation  that  gave  rise  to  it  may  be  artificial.  The 

situation  we  have  considered  involves  a  top  level  goal 
with  no  alternatives,  but  which  admits  the  possibility  of 
failure.  Realistically,  a  goal  like  (Boxl  Nextto  Box2) 

never  has  such  a  high  priority.  Living  systems  always 
possess  an  alternative,  and  have  a  top  level  goal  like  "Do 
the  best  you  can";  such  a  goal  cannot  fail.  Local 

"failure"  then  is  not  infinitely  costly  when  there  is  a 
global  alternative.  Stating  this  another  way,  subgoals 
must  be  assigned  realistic  benefits  (inverse  costs) , 
rather  than  the  implicit  infinite  benefit  of  a  goal 
without  alternatives. 


This  would  lead  to  a  change  in  our  assumed  strategy 
for  a  goal  like  (A  OR  B) ;  it  would  no  longer  be  either 
"try  A,  if  that  fails,  try  B",  or  "try  B,  if  that  fails, 
try  A",  because  the  cost  might  outweigh  the  benefit.  The 
result  would  be  a  changed  decision  criterion  and  a 
different  way  of  backing  up  distributions,  hopefully  a 
change  enabling  logical  consistency. 


Despite  these  deficiencies,  a  powerful  new  decision 
maker  for  use  in  planning  has  been  proposed,  one  which 
takes  into  account  the  cost  of  planning.  It  should  be 
noted  in  this  regard  that  planning  and  execution  are 
treated  as  merely  two  types  of  the  same  activity,  and  so 
the  techniques  developed  here  are  useful  for  execution 
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monitoring  as  well  as  guiding  planning.  The  significant 
difference  is  one  of  re versabiiitv ;  when  planning,  one  can 
simply  return  to  a  bypassed  choice  with  no  cost,  whereas 
with  execution,  one  has  to  pay  to  re-establish  a  previous 
state  of  the  world  *if  one  can  do  it  at  all).  However, 
this  is  just  another  instance  of  interdependence  among  the 
costs  of  alternatives;  ignoring  it  should  be  no  more 
detrimental  than  ignoring  other  interdependencies. 

We  gain  insight  into  the  power  of  the  look-ahead 
planning  proposal  by  considering  it  as  a  meta-planner, 
choosing  among  planning  processes.  With  a  simple 
evaluation  function,  the  meta- planner  intervenes  only  once 
at  the  start  of  planning,  to  select  a  particular  planning 
process  to  achieve  the  goal;  any  subsequent 
reconsideration  is  inconsistent  with  a  simple  evaluation 
function  (see  Section  3.2  above).  The  proposed 
met  a- planner ,  however,  "knows**  it  will  intervene  again; 
the  decisions  it  makes  depend  on  this  knowledge.  Thus,  it 
is  in  some  sense  "self  aware".  Indeed,  this  is  a  reversal 
of  the  feedback  approach  usual  in  process  control;  rather, 
it  is  "feed  forward".  That  is,  the  proposed  meta-planner 
not  only  corrects  its  past  errors,  but  anticipates  its 
current  ones.  This  self  awareness  or  anticipation  of  self 
is  a  much  lauded  feature  of  human  intelligence. 
Incorporating  it  into  a  planning  system  as  done  here  thus 
seems  like  a  valuable  step  on  the  road  to  artificial 
intelligence. 
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APPENDIX  I 

ROBOT  SPECIFICATIONS 


Conventions 

-All  assertions  and  operators  may  contain  lower  case 

letters;  these  are  for  readability  only.  Thus  (Doorl 
Connects  Rooml  SoomS)  is  an  explainatory  elaboration 
of  (D  1  C  El  R 5) 

Assertions  are  written  in  predicate  infix  notation 
(Argument  1  Predicate  Argument!  Arguement3  .  ..} 

-Operators  are  written  in  functional  notation: 
Operator-name  (Argument 1  Arguement!  . . « ) 

-Arguments  and  predicates  are  delimited  by  blanks  (and 
lower  case  letters) 

-The  relevance  of  an  operator  to  a  unit  (sub) goal  is 

determined  by  matching  the  literals  marked  with  #  on 
its  add  list  with  the  goal  literal.  Matching  of  the 
onerator  arguments  is  assumed  to  be  restricted  in  the 
obvious  way  (eg.  the  argument  of  Go t o 1  is  restricted 
to  locations,  the  arguments  of  Pushto  are  restricted 
not  to  be  the  robot,  etc.) 

-Elaboration  of  conjuncts  proceeds  left  to  right,  in  the 
manner  of  PLANNER  or  LANALY.  That  is,  in  planning 
Pushto (X,Y),  the  system  will  fully  elaborate  a  plan 
for  (I  Nexto  X)  before  elaborating  the  subgoal  (X  IN 
r(Yj)  any  further.  Also,  execution  of  the  leftmost 
subgoals  is  modelled  in  the  manner  of  STRIPS;  the 
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result  is  that  the  input  world  model  to  any  subgoal 
is  fully  determined  before  that  subgoal  is 
elaborated. 

Once  an  OR  node  has  been  expanded,  a  choice  is  made  using 
the  proposed  decision  criterion.  This  may  require 
further  expansion  to  enable  the  characterizations  of 
the  ORed  alternatives  to  be  evaluated.  For  example, 
in  Figure  <tree  for  (Q  ST  ON) >,  nodes  3  through  7 
must  be  added  to  evaluate  node  2. 

A  unit  (sub) goal  is  treated  as  an  OR  node;  its  successors 
are  the  relevant  operators  resulting  from  add  list 
matching. 

Free  variables  are  denoted  by  SYariable- name.  These  are 
bound  by  matching  the  first  literal  (unit  subgoal) 
involving  them  with  the  world  model.  If  an 
alternative  instanstiat ion  is  possible,  an 
alternative  (sub) goal  is  sprouted  from  the  nearest 
ancestor  node  for  which  there  is  a  tabulated 
evaluation  function. 

An  illustration  of  the  preceeding  2  rules  is  from 
Figure  Ctree  for  (Q  ST  ON) >.  Node  1  starts  as 

1  (Q  ST  ON)  OR 

I 

I 

2  T  (Q)  OR 

since  there  is  only  one  match  for  (Q  ST  ON)  in  the 
operator  add  lists.  No  choice  is  required  of  the 
unary  OR.  When  node  3  is  expanded,  a  free  variable  $X 
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is  introduced  in  node  5,  the  subgoal  ($X  Type  Box) . 
Matching  produces  the  instantiation  Boxl  for  X  and 
the  availability  of  alternative  boxes.  Nodes  3  and  2 
have  no  tabulated  evaluation  functions,  but  node  1 
does,  so  the  alternative  node  8  is  sprouted. 

•Free  variables  generated  together  are  instantiated  left 
to  right.  The  ($X  Nexto  $Y)  example  (Section  4.1) 
illustrates  this. 


Initial  world  model 


c 

Room  1 

RoomS) 

(Door  1 

Connects 

RoomS 

Rooml ) 

s 

Room2 

RoomS) 

(Door2 

Connects 

Room5 

Room2 ) 

s 

Room3 

RoomS) 

(Door3 

Connects 

RoomS 

Room3) 

s 

Roomh 

Room5) 

(Doorh 

Connects 

Room5 

Room4) 

AT  AA) 

(Box2  AT 

BB) 

AT  CC) 

(Switchl 

AT  DD) 

IN  Rooml) 

(Box2 

IN  Room 

IN  Rooml) 

(I  IN 

Rooml ) 

hi  IN  Rooml 

) 

Type  Box) 

(Box2 

Type  Bo 

Type  Box) 

(Door 

4  Type  D 

Type  Door) 

(Door2  'Type 

(Doorl  Type  Door) 

(Boxl  Pushable) 

(3 ox3  Pushable) 


(Switchl  Type  Switch) 
(3ox2  Pushable) 


(FF  Located  in  Roomh) 
(EE  ATR) 

(onFloor) 

(Switchl  STatus  OFF) 


ODerators 


Gotol  (M)  Robot  goes  to  location  M 
Preconditions : 

(M  LOC  $R )  (F)  (I  IN  R) 
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Delete:  ($  ATE)  {I  N  $) 

Add:  #(H  ATE) 

Goto2  (X)  Eobot  goes  nest  to  item  X 
Preconditions : 

(F)  (X  IN  $R)  (I  IN  E) 

OR  (X  C  $R  $T)  {?)  (I  IN  E) 

OR  (X  C  $R  ST )  (?)  (I  IN  T) 

Delete:  ($  ATE)  (I  N  $) 

Add:  # (I  N  X) 

Pushto(X,Y)  Robot  pashes  object  X  next  to  item  Y 
Preconditions : 

(X  P)  (?)  (I  N  X)  (Y  IN  $E)  (X  IN  E) 

OR  (X  P)  (X  C  $R  $T)  {?)  {I  N  X)  (X  IN  R) 

OR  (X  P)  (X  C  $R  $T)  (F)  (I  N  X)  (X  IN  T) 

Delete:  {$  ATE)  { I  NS)  (S  N  X)  (X  AT  $)  {X  N  S) 

Add:  #{X  N  Y)  #  (Y  N  X) 

(I  N  X) 

Turnlight  (Q)  Robot  turns  on  lightswitch  Q 
Preconditions: 

(Q  T  S)  ($X  T  B)  iX  N  Q)  (I  ON  X) 

Delete:  (Q  ST  OFF) 

Add:  #  (Q  ST  ON) 

CiimbUponbox (X)  Eobot  climbs  up  on  box  X 
Preconditions: 

(X  T  B)  (F)  { I  N  X) 

Delete:  {$  ATE)  (F) 

Add:  ill  ON  X) 

ClimbDownof f box  (x)  Eobot  climbs  down  off  box  X 
Preconditions: 

(X  T  B)  (I  ON  X) 

Delete:  (I  ON  X) 

Add:  #(F) 

GothruDoor (Df R, T)  Eobot  goes  through  Door  from  Room 
into  room  T 
Preconditions : 

(D  C  E  T)  (F)  (I  IN  R)  (I  N  D) 

Delete:  t$  ATE)  (I  N  $)  (I  IN  $) 

Add:  #  (I  IN  T) 

Pusht hr u Door (0 , D# S , S)  Robot  pushes  Object  through  Door 
from  Room  into  room  S 
Preconditions: 

10  P)  (F)  (DOES)  (I  N  0)  iO  IN  R)  (0  N  D) 
Delete :  (I  AT  S)  (I  NS)  (ON  $)  ($  N  0)  (0  AT  S) 

(I  IN  $)  (0  IN  $) 

Add:  #  (0  IN  S) 

(I  IN  S)  (0  N  D)  (IN  0) 


■ 

i 
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Characterizations  of  unit  subgoals 

The  following  tables  give  the  characterization  of 
selected  unit  subgoals.  For  each  unit  subgoal,  several 
characterizations  are  provided  for  various  input  world 
situations . 

The  followint  notation  is  used  in  describing 
situations : 

1.  r(X)  the  room  containing  X 

2.  j  (X ,  Y)  the  minimum  number  of  doors  on  a  path 
from  r  (X)  to  r  { Y)  . 

3.  Rooms  fall  into  two  categories:  firstly,  the 

hall,  R5,  designated  h,  and  secondly  the  others, 
designated  u,v,w;  different  symbols  denote 
distinct  rooms.  That  is,  the  situation  (I  IN 
R1),  (30X1  IN  Rl )  would  be  designated  u,u  while 

(I  IN  Rl),  (BOX  1  IN  R2)  would  be  designated  u,v. 

4.  Characterizations  are  the  expected  cost  t, 
initial  cost  s,  and  a  distribution  f(t®),  This 
distribution  Is  always  a  discrete  set  of  points, 
and  is  described  by  a  notation 

f:  ql(tl),  q2~(t2)  ,  .  .  . 

y hare  value  tl  occurs  with  probability  ql ,  etc. 

5.  The  operator  Pushto  also  has  a  characterization, 
to  allow  P  (X  Y)  and  P  (Y  X)  to  be  distinguished. 


(I  IN  R) 


s 

£liil 

0 

0 

i 

45  +  2a 

1  1 

. 5 (23+  2a)  , 

.  5  (45+2a) 

2 

90  +  4a 

1  1 

. 5 (68+4a)  , 

•  5  ( 90  +  4a) 

(I  N  X) 


HULL 

t 

s 

iit9l 

0 

1 6  +  a 

1  6 

1 . 0  (a) 

1 

6  1  +3a 

27 

•  5  (23  +  3a)  , 

. 5  (45  +  3a) 

2 

1 06+5a 

27 

.5  (68  +  5a)  , 

. 5 (90+  5a) 

' 

0 

■  ?■ 


* 

• 

' 

' 

. 
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2±l*.l  1 

t 

s 

fit'i 

0 

0 

1 

83  +  3a 

15 

.5  (53  +  3a)  , 

2 

1 50  +  5a 

15 

.5(1 20+5a) 

0 

0 

1 

1 28  +5a 

15 

. 5  (98  +  5a)  , 

2 

19  5+7a 

15 

.5(165+7a) 

0 

0 

1 

1  73  +  7a 

15 

,5  ( 1 43+  7a ) 

2 

240+9a 

15 

. 5  (21 0+9a) 

Y) 

min 
j  d/X) 

iiltXL 

t 

7* 

s 

Hill 

0 

0 

53  +  2a 

53 

1 . 0  (2a) 

0 

1 

120+4a 

68 

.  5 (37  +  4a)  , 

0 

2 

137+6a 

68 

.5 (104+ 6 a)  , 

1 

0 

98  +  4a 

64 

.  5  ( 2  3  +  4  a )  , 

1 

1 

165+6a 

64 

. 5  (90  +  6a)  , 

1 

2 

232+8a 

64 

. 5 ( 1 5  7  +  8a)  , 

2 

0 

1  43  +  6a 

64 

. 5 (68+6a) , 

2 

1 

2  1  0  +  8a 

6  4 

.  5 ( 135  +  8a)  , 

2 

2 

277+10a 

64 

. 5 (202+ 10a, 

.  5  (83  +  3a) 

.  5  1 1  5 C+5a) 

. 5  ( 128+5a) 

.  5  ( 1  9  5+  7 a) 

.5  (1  73+  7a) 
,  .  5  (240+9a) 


5  »  6  7+ 4 a) 
.5(1 3  4  +  6a) 

5 (4  5+  4a) 

5  (1 1 2  +  6a) 

. 5  { 1 79  +  8a) 

5  (90+  6a) 

.5  ( 1 5 7  +  8a) 

.  5  ( 22 4  + 1 0 a) 


. 


' 
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P  (X  Y ) 


like 

above  table. 

but 

column  headings 

ail*. 

XL  aiJLe.il 

t 

s  Lit  *1 

IN  R) 

% 

- 

R 

Ei.il  t 

s 

f  It  ») 

h 

h  1 28  +  5a 

17 

.5(98+5a),  .5(128  +  5a) 

h 

u  91+3a 

2 

.  25  (83  +  3a)  ,  .  75  (91  +3a) 

u 

u  136 +5 a 

2 

.  25  (128+5a)  ,  .75{136  +  5a) 

u 

h  9 1  +  3a 

2 

.25  (83  +  3a),  .75(91+3a) 

u 

v  1 3  6  +  5  a 

2 

.25  (1  28  +  5a)  ,  ,  75  (136  +  5a) 

N  Y) 

rill 

.  Eill  E 

s  fll *1 

h 

h  63+2a 

2  . 2  (53+2a)  t  .8(63  +  2a) 

h 

u  1  Q8  +  4a 

2  .2  (98  +  4a)  ,  .8 (108  +  4a) 

u 

u  63+2a 

2  .2 (53  +  2a)  ,  .8  (63  +  2a) 

a 

h  108+ 4a 

2  . 2 ( 98+  4a)  ,  .8 ( 108  + 4a) 

u 

v  153+ 6a 

2  .  2  ( 1 43+6a)  ,  .8  (153  +  6a) 

N  $Y) 

t  = 

73  +  2a 

fit* 

):  ,2(63  +  2a)/  .8(73  +  sa) 

s  =  2 


. 


' 
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(Q  ST  ON) 

r  (I) 

£121 

+ 

s 

full 

h 

h 

1 17  +  4a 

11 

.2  (62+4a)  „  .8(1 1  7  +  4a) 

h 

u 

14  5. 5  +  6  a 

11 

.2(11  7  +  6a)  r  .  2  ( 1 2 9  +  6 a )  , 

u 

h 

1 4  5 • 5+6  a 

11 

.6  (14  5. 5  +  6a) 

.2  ( 1 1  7  +  6 a )  ,  .2(129  +  6a)  , 

u 

u 

1 1 7  +  4  a 

11 

,6  ( 1  4 5.  5  +  6a) 

.2  (62+4a)  r  . 8  (1 1 7  +  4a) 

u 

V 

190. 5+8a 

11 

. 2  ( 1 52  +  8a)  ,  .2 (174+8a)  , 

. 

. 


